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PRODUCTION OF POLYUNSATURATED FATTY ACIDS BY EXPRESSION OF 
POLYKETIDE-LIKE SYNTHESIS GENES IN PLANTS 


INTRODUCTION 

5 Field of the Invention 

This invention relates to modulating levels of enzymes and/or enzyme components 
capable of modifying long chain poly-unsaturated fatty acids (PUFAs) in a host cell, and 
constracts and methods for producing PUFAs in a host cell. The invention is exemplified 
by production of eicosapentenoic acid (EPA) using genes derived from Shewanella 

10 putrefaciem and Vibrio marinus. 


Background 

Two main families of poly-unsaturated fatty acids (PUFAs) are the q)3 fatty acids, 
exemplified by eicosapentenoic acid, and the ©6 fatty acids, exemplified by arachidonic 

1 5 acid. PUFAs are important comiponents of the plasma membrane of the cell, where they 
can be found in such forms as phospholipids, and also can be found in triglycerides. 
PUFAs also serve as precursors to other molecules of importance in human beings and 
animals, including the prostacyclins, leukotrienes and prostaglandins. Long chain PUFAs 
of importance include docosahexenoic acid (DHA) and eicosapentenoic acid (EPA), 

20 which are found primarily in different types offish oil, gamma-linolenic acid (GLA), 

which is foimd in the seeds of a number of plants, including evening primrose {Oenothera 
biennis\ borage (Borago officinalis) and black currants (Ribes nigrum\ stearidonic acid 
(SDA), which is found in marine oils and plant seeds, and arachidonic acid (ARA), which 
along with GLA is found in filamentous firngi. ARA can be purified from animal tissues 

25 including liver and adrenal gland. Several genera of marine bacteria are known which 
synthesize either EPA or DHA. DHA is present in human milk along with ARA. 

PUFAs are necessary for proper development, particularly in the developing infant 
bram, and for tissue formation and repah". As an example, DHA, is an important 
constituent of many human cell membranes, in particular nervous cells (gray matter), 

30 muscle cells, and spermatozoa and believed to affect the development of brain functions 
in general and to be essential for the development of eyesight. EPA and DHA have a 
number of nutritional and pharmacological uses. As an example adults affected by 
diabetes (especially non insulin-dependent) show deficiencies and imbalances in their 
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levels of DHA \siiich are believed to contribute to later coronary conditions. Therefore a 
diet balanced in DHA may be beneficial to diabetics. 

For DHA, a number of sources exist for commercial production including a 
variety of marine organisms, oils obtained from cold water marine fish, and egg yolk 

5 fractions. The purification of DHA from fish sources is relatively expensive due to 
technical difiGculties, making DHA expensive and in short supply. In algae such as 
Amphidinium and Schyzochytrium and marine fungi such as Thraustochytrium DHA may 
represent up to 48% of the fatty acid content of the cell. A few bacteria also are reported 
to produce DHA. These are generally deep sea bacteria such as Vibrio marinus. For 

10 ARA, microorganisms including the genera Mortierella, Entomophthora, Phytium and 
Porphyridium can be used for commercial production. Commercial sources of SDA 
include the genera Trichodesma and Echium. Commercial sources of GLA include 
evening primrose, black currants and borage. However, there are several disadvantages 
associated with commercial production of PUFAs from natural sources. Natural sources 

15 of PUFA, such as animals and plants, tend to have highly heterogeneous oil compositions. 
The oils obtained from these sources can require extensive purification to separate out one 
or more desired PUFA or to produce an oil which is enriched in one or more desired 
PUFA. 

Natural sources also are subject to imcontrollable fluctuations in availability. Fish 
20 stocks may undergo natural variation or may be depleted by overfishing. Animal oils, and 
particularly fish oils, can accumulate environmental pollutants. Weather and disease can 
cause fluctuation in yields from both fish and plant sources. Cropland available for 
production of alternate oil-producing crops is subject to competition from the steady 
expansion of human populations and the associated increased need for food production on 
25 the remaining arable land. Crops which do produce PUFAs, such as borage, have not 
been adapted to commercial growth and may not perform well in monoculture. Growth 
of such crops is thus not economically competitive where more profitable and better 
established crops can be grown. Large -scale fermentation of organisms such as 
Shewanella also is expensive. Natural animal tissues contain low amounts of ARA and 
30 are difficult to process. Microorganisms such as Porphyridium and Shewanella are 
difficult to cultivate on a commercial scale. 

Dietary supplements and pharmaceutical formulations containing PUFAs can 
retain the disadvantages of the PUFA source. Supplements such as fish oil capsules can 
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contain low levels of the particular desired component and thus require large dosages. 
High dosages result in ingestion of high levels of undesired components, including 
contammants. Care must be taken in providing fatty acid supplements, as overaddition 
may result in suppression of endogenous biosynthetic pathways and lead to competition 

5 with other necessary fatty acids in various lipid fractions in vivo, leading to undesirable 
resuhs. For example, Eskimos having a diet high in g>3 fatty acids have an increased 
tendency to bleed (U.S. Pat. No. 4,874,603). Fish oils have unpleasant tastes and odors, 
which may be impossible to economically separate from the desired product, such as a 
food supplements. Unpleasant tastes and odors of the supplements can make such 

10 regimens involving the supplement undesirable and may inhibit compliance by the 
patient. 

A number of enzymes have been identified as being involved in PUPA 
biosynthesis. Linoleic acid (LA, 18:2 A 9, 12) is produced from oleic acid (18:1 a9) by a 
Al2-desaturase. GLA (18:3 A 6, 9, 12) is produced from linoleic acid (LA, 18:2 a9, 12) 

15 by a A6-desaturase. ARA (20:4 A 5, 8, 1 1, 14) is produced from DGLA (20:3 A 8, 1 1, 
14), catalyzed by a A5-desaturase. Eicosapentenoic acid (EPA) is a 20 carbon, omega 3 
fatty acid containing 5 double bonds (A 5, 8, 1 1, 14, 17), all in the cis configuration. EPA, 
and the related DHA (A 4, 7, 10, 13, 16, 19, C22:6) are produced from oleic acid by a 
series of elongation and desaturation reactions. Additionally, an elongase (or elongases) 

20 is required to extend the 18 carbon PUFAs out to 20 and 22 carbon chain lengths. 

However, animals cannot convert oleic acid (18:1 A 9) into linoleic acid (18:2 A 9, 12). 
Likewise, ji-linolenic acid (ALA, 18:3 A 9, 12, 15) cannot be synthesized by mammals. 
Other eukaryotes, including frmgi and plants, have enzymes which desaturate at positions 
Al2 and Al5. The major poly-unsaturated fatty acids of animals therefore are either 

25 derived from diet and/or from desaturation and elongation of linoleic acid (1 8:2 A 9, 12) 
or ji-linolenic acid (18:3 A 9, 12, 15). 

Poly-unsaturated fatty acids are considered to be useful for nutritional, 
pharmaceutical, mdustrial, and other purposes. An expansive supply of poly-unsaturated 
fatty acids from natural sources and from chemical synthesis are not sufficient for 

30 commercial needs. Because a number of separate desaturase and elongase enzymes are 
required for fatty acid synthesis from linoleic acid (LA, 18:2 A 9, 12), common in most 
plant species, to the more saturated and longer chain PUFAs, engineering plant host cells 
for the expression of EPA and DHA may require expression of five or six separate 
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enzyme activities to achieve expression, at least for EPA and DHA, and for production of 
quantities of such PUFAs additional engineering efforts may be required, for instance the 
down regulation of enzymes competing for substrate, engineering of higher enzyme 
activities such as by mutagenesis or targeting of enzymes to plastid organelles. Therefore 
5 it is of interest to obtain genetic material involved in PUFA biosynthesis from species that 
naturally produce these fatty acids and to express the isolated material alone or in 
combination in a heterologous system which can be manipulated to allow production of 
commercial quantities of PUFAs. 

10 Relevant Literature 

Several genera of marine bacteria have been identified which synthesize either 
EPA or DHA (DeLong and Yayanos, Applied and Environmental Microbiology ( 1 986) 
51 : 730-737). Researchers of the Sagami Chemical Research Institute have reported EPA 
production in E, coli which have been transformed with a gene cluster from the marine 

15 bacterium, Shewanella putrefaciens. A minimum of 5 open reading frames (ORFs) are 
required for fatty acid synthesis of EPA in E. coli. To date, extensive characterization of 
the fiinctions of the proteins encoded by these genes has not been reported (Yazawa 
(1996) Lipids 3U S-297; WO 93/23545; WO 96/21735). 

The protein sequence of open reading frame (ORF) 3 as published by Yazawa, 

20 USPN 5,683,898 is not a functional protein. Yazawa defines the protein as initiating at 
the methionine codon at nucleotides 9016-9014 of the Shewanella PKS-like cluster 
(Genbank accession U73935) and ending at the stop codon at nucleotides 8185-8183 of 
the Shewanella PKS-like cluster. However, when this ORF is expressed under control of 
a heterologous promoter in an E. coli strain containing the entire PKS-like cluster except 

25 ORF 3, the recombinant cells do not produce EPA. 

Polyketides are secondary metabolites the synthesis of which involves a set of 
enzymatic reactions analogous to those of fatty acid synthesis (see reviews: Hopwood 
and Sherman, Annu. Rev. Genet (1990) 24: 37-66, and Katz and Donadio, in Annual 
Review of Microbiology (1993) 47: 875-912). It has been proposed to use polyketide 

30 synthases to produce novel antibiotics (Hutchinson and Fujii, Annual Review of 
Microbiology (1995) 49:201-238). 


r 
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SUMMARY OF THE INVENTION 
Novel compositions and methods are provided for preparation of long chain poly- 
unsaturated fatty acids (PUFAs) using polyketide-like synthesis (PKS-like) genes in 

5 plants and plant cells. In contrast to the known and proposed methods for production of 
PUFAs by means of fatty acid synthesis genes, by the invention constructs and methods 
are provided for producing PUFAs by utilizing genes of a PKS-like system. The methods 
involve growing a host cell of interest transformed with an expression cassette functional 
in the host cell, the expression cassette comprising a transcriptional and translational 

10 initiation regulatory region, joined in reading frame S* to a DNA sequence to a gene or 
component of a PKS-like system capable of modulating the production of PUFAs (PKS- 
like gene). An alteration in the PUFA profile of host cells is achieved by expression 
following introduction of a complete PKS-like system responsible for a PUFA 
biosynthesis into host cells. The invention finds use for example in the large scale 

15 production of DHA and EPA and for modification of the fatty acid profile of host cells 
and edible plant tissues and/or plant parts. 


BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 provides designations for the ORFs of the EPA gene cluster of 
20 Shewanella, Figure 1 A shows the organization of the genes; those ORFs essential for 
EPA production in E. coli are numbered. Figure IB shows the designations given to 
subclones. 

Figure 2 provides the Shewanella PKS-like domain structure, motifs and 'Blast' 
matches of ORF 6 (Figure 2A), ORF 7 (Figure 2B), ORF 8 (Figure 2C), ORF 9 
25 (Figure 2D) and ORF 3 (Figure 2E), Figure 2F shows the structure of the region of the 
Anabeana chromosome that is related to domains present in Shewanella EPA ORFs. 
Figure 3 shows results for pantethenylation - ORF 3 in £. coli strain SJ16. 
Figure 4 is the sequence for the PKS-like cluster found in Shewanella, containing 
ORFs 3, 4, 5, 6, 7, 8 and 9. The start and last codons for each ORF are as follows: 
30 0RF3 (published-inactive): 9016, 8186; 0RF3 (active in EPA synthesis): 9157, 8186; 
ORF 6: 13906, 22173; ORF 7: 22203, 24515; ORF 8: 24518, 30529; ORF 9: 30730, 
32358. 
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Figure 5 shows the sequence for the PKS-like cluster in an approximately 40 kb 
DNA jfragment of Vibrio marinus, containing ORFs 6, 7, 8 and 9. The start and last 
Condons for each ORF are as follows: ORF 6: 17394, 25352; ORF 7: 25509, 28160; ORF 
8: 28209, 34265; ORF 9: 34454, 36118. 
5 Figure 6 shows the sequence for an approximately 1 9 kb portion of the PKS-like 

cluster of Figure 5 which contains the ORFs 6, 7, 8 and 9. The start and last condons for 
each ORF are as follows: ORF 6: 411, 8369; ORF 7: 8526, 1 1 177; ORF 8: 1 1226, 17282; 
0RF9: 17471, 19135. 

Figure 7 shows a comparison of the PKS-like gene clusters of Shewanella 
10 putrefaciem and Vibrio marinus; Figure 7B is the Vibrio marinus operon sequence. 

Figure 8 is an expanded view of the PKS-like gene cluster portion of Vibrio 
marinus shown in Figure 7B showing that ORFs 6, 7 and 8 are in reading frame 2, while 
ORF 9 is in reading frame 3. 

Figure 9 demonstrates sequence homology of ORF 6 of Shewanella putrefaciens 
15 and Vibrio marinus. The Shewanella ORF 6 is depicted on the vertical axis, and the 
Vibrio ORF 6 is depicted on the horizontal axis. Lines indicate regions of the proteins 
that have a 60% identity. The repeated lines in the middle correspond to the multiple 
ACP domains found in ORF 6. 

Figure 10 demonstrates sequence homology of ORF 7 of Shewanella putrefaciens 
20 and Vibrio marinus. The Shewanella ORF 7 is depicted on the vertical axis, and the 
Vibrio ORF 7 is depicted on the horizontal axis. Lines indicate regions of the proteins 
that have a 60% identity. 

Figure 1 1 demonstrates sequence homology of ORF 8 of Shewanella putrefaciens 
and Vibrio marinus. The Shewanella ORF 8 is depicted on the vertical axis, and the 
25 Vibro, ORF 8 is depicted on the horizontal axis. Lines indicate regions of the proteins 
that have a 60% identity. 

Figure 12 demonstrates sequence homology of ORF 9 of Shewanella putrefaciens 
and Vibrio marinus. The Shewanella ORF 9 is depicted on the vertical axis, and the 
Vibrio ORF 9 is depicted on the horizontal axis. Lines indicate regions of the proteins 
30 that have a 60% identity. 

Figure 13 is a depiction of various complementation experiments, and resulting 
PUFA production. On the right, is shown the longest PUFA made in the E. coli strain 
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containing the Vibrio and Shewanella genes depicted on the left. The hollow boxes 
indicate ORFs from Shewanella. The solid boxes indicate ORFs from Vibrio, 

Figure 14 is a chromatogram showing fatty acid production fix)m complementation 
of pEPADS from Shewanella (deletion ORF 8) with ORF 8 from Shewanella, in E. coli 
5 Fad E-. The chromatogram presents an EPA (20:5) peak. 

Figure 15 is a chromatogram showing fatty acid production from complementation 
of pEPADS from Shewanella (deletion ORF 8) with ORF 8 from Vibrio marinus, m E. 
coli Fad E-. The chromatograph presents EPA (20:5) and DHA (22:6) peaks. 

Figure 16 is a table of PUFA values from the ORF 8 complementation 
10 experiment, the chromatogram of which is shown in Figure 15. 

Figure 17 is a plasmid map showing the elements of pCGN7770. 

Figure 1 8 is a plasmid map showing the elements of pCGN8535. 

Figure 19 is a plasmid map showing the elements of pCGN8537. 

Figure 20 is a plasmid map showing the elements of pCGN8525. 
15 Figure 2 1 is a comparison of the Shewanella ORFs as defmed by Yazawa and 

those disclosed in Figure 4. When a protein starting at the leucine (TTG) codon at 
nucleotides 9157-9155 and ending at the stop codon at nucleotides 8185-8183 is 
expressed under control of a heterologous promoter in an E. coli strain containmg the 
entire PKS-like cluster except ORF 3, the recombinant cells do produce EPA. Thus, the 
20 published protein sequence is likely to be wrong, and the coding sequence for the protein 
may start at the TTG codon at nucleotides 9157-9155 or the TTG codon at nucleotides 
9172-9170. This information is critical to the expression of a functional PKS-like clxister 
heterologous system. 

Figure 22 is a plasmid map showmg the elements of pCGN8560. 
25 Figure 23 is plasmid map showing the elements of pCGN8556. 

Figure 24 shows the translated DNA sequence upstream of the published ORF 3. 
The ATG start codon at position 9016 is the start codon for the protein described by 
Yazawa etal{l 996) supra. The other arrows depict TTG or ATT codons that can also 
serve as start codons in bacteria. When ORF 3 is started from the published ATG codon 
30 at 9016, the protein is not fimctional in making EPA. When ORF 3 is initiated at the 
TTG codon at position 91 57, the protein is capable of facilitating EPA synthesis. 
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DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In accordance with the subject invention, novel DNA sequences, DNA constructs 
and methods are provided, which include some or all of the polyketide-like synthesis 
(PKS-like) pathway genes from Shewanella, Vibrio or other microorganisms, for 

5 modifying the poly-unsaturated long chain fatty acid content of host cells, particularly 
host plant cells. The present invention demonstrates that EPA synthesis genes in 
Shewanella putrefaciem constitute a polyketide-like synthesis pathway. Functions are 
ascribed to the Shewanella and Vibrio genes and methods are provided for the production 
of EPA and DHA in host cells. The method includes the step of transforming cells with 

10 an expression cassette comprising a DNA encoding a polypeptide capable of increasing 
the amount of one or more PUFA in the host cell. Desirably, integration constructs are 
prepared which provide for integration of the expression cassette into the genome of a 
host cell. Host cells are manipulated to express a sense or antisense DNA encoding a 
polypeptide(s) that has PKS-like gene activity. By "PKS-like gene" is intended a 

15 polypeptide which is responsible for any one or more of the functions of a PKS-like 
activity of interest. By "polypeptide" is meant any chain of amino acids, regardless of 
length or post-translational modification, for example, glycosylation or phosphorylation. 
Depending upon the nature of the host cell, the substrate(s) for the expressed enzyme may 
be produced by the host cell or may be exogenously supplied. Of particular interest is the 

20 selective control of PUFA production in plant tissues and/or plant parts such as leaves, 
roots, fruits and seeds. The invention can be used to synthesize EPA, DHA, and other 
related PUFAs in host cells. 

There are many advantages to transgenic production of PUFAs. As an example, in 
transgenic E. coli as in Shewanella, EPA accumulates in the phospholipid fraction, 

25 specifically in the sn-l position. It may be possible to produce a structured lipid in a 

desired host cell which differs substantially from that produced in either Shewanella or E. 
coli. Additionally transgenic production of PUFAs in particular host cells offers several 
advantages over purification from natural sources such as fish or plants. In transgenic 
plants, by utilizing a PKS-like system, fatty acid synthesis of PUFAs is achieved in the 

30 cytoplasm by a system which produces the PUFAs through de novo production of the 
fatty acids utilizing malonyl Co-A and acetyl Co-A as substrates. In this fashion, 
potential problems, such as those associated with substrate competition and diversion of 
normal products of fatty acid synthesis in a host to PUFA production, are avoided. 
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Production of fatty acids from recombinant plants provides the ability to alter the 
naturally occurring plant fatty acid profile by providing new synthetic pathways in the 
host or by suppressing undesired pathways, thereby increasing levels of desired PUFAs» 
or conjugated forms thereof, and decreasing levels of undesired PUFAs. Production of 

5 fatty acids in transgenic plants also offers the advantage that expression of PKS-like genes 
in particular tissues and/or plant parts means that greatly increased levels of desired 
PUFAs in those tissues and/or parts can be achieved, making recovery from those tissues 
more economical Expression in a plant tissue and/or plant part presents certain 
efficiencies, particularly where the tissue or part is one which is easily harvested, such as 

10 seed, leaves, fruits, flowers, roots, etc. For example, ttie desired PUFAs can be expressed 
in seed; methods of isolating seed oils are well established. In addition to providing a 
source for purification of desired PUFAs, seed oil components can be manipulated 
through expression of PKS-like genes, either alone or in combination with other genes 
such as elongases, to provide seed oils having a particular PUFA profile in concentrated 

1 5 form. The concentrated seed oils then can be added to animal milks and/or synthetic or 
semisynthetic milks to serve as infant formulas where human nursing is impossible or 
undesired, or in cases of malnourishment or disease in both adults and infants. 

Transgenic microbial production of fatty acids offers the advantages that many 
microbes are known with greatly simplified oil compositions as compared with those of 

20 higher organisms, making purification of desired components easier. Microbial 

production is not subject to fluctuations caused by external variables such as weather and 
food supply, Microbially produced oil is substantially free of contamination by 
environmental pollutants. Additionally, microbes can provide PUFAs in particular forms 
which may have specific uses. For example, Spirulim can provide PUFAs predominantly 

25 at the first and third positions of triglycerides; digestion by pancreatic lipases 

preferentially releases fatty acids from these positions. Following human or animal 
ingestion of triglycerides derived . from Spirulina, thes PUFAs are released by pancreatic 
lipases as free fatty acids and thus are directly available, for example, for infant brain 
development. Additionally, microbial oil production can be manipulated by controlling 

30 culture conditions, notably by providmg particular substrates for microbially expressed 
enzymes, or by addition of compounds which suppress undesired biochemical pathways. 
In addition to these advantages, production of fatty acids from recombinant microbes 
provides the ability to alter the naturally occurring microbial fatty acid profile by 
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providing new synthetic pathways in the host or by suppressing undesired pathways, 
thereby increasing levels of desired PUFAs, or conjugated forms thereof, and decreasing 
levels of undesired PUFAs. 

Production of fatty acids in animals also presents several advantages. Expression 

S of desaturase genes in animals can produce greatly increased levels of desired PUFAs in 
animal tissues, making recovery from those tissues more economical. For example, 
where the desired PUFAs are expressed in the breast milk of animals, methods of 
isolating PUFAs from animal milk are well established, to addition to providing a source 
for purification of desired PUFAs, animal breast milk can be manipulated through 

10 expression of desaturase genes, either alone or in combmation with other human genes, to 
provide animal milks with a PUFA composition substantially similar to human breast 
milk during the different stages of infant development. Humanized animal milks could 
serve as infant formulas where human nursing is impossible or undesired, or in the cases 
of malnourishment or disease. 

15 DNAs encoding desired PKS-like genes can be identified in a variety of ways. In 

one method, a source of a desired PKS-like gene, for example genomic libraries from a 
Shewanella or Vibrio spp., is screened with detectable enzymatically- or chemically- 
synthesized probes. Sources of ORFs having PKS-like genes are those organisms which 
produce a desired PUFA, including DHA-producing or EPA-producing deep sea bacteria 

20 growing preferentially under high pressure or at relatively low temperature. 

Microorgansims such as Shewanella which produce EPA or DHA also can be used as a 
source of PKS-like genes. The probes can be made from DNA, KNA, or non-naturally 
occurring nucleotides, or mixtures thereof. Probes can be enzymatically synthesized from 
DNAs of known PKS-like genes for normal or reduced-stringency hybridization methods. 

25 For discussions of nucleic acid probe design and annealing conditions, see, for example, 
Sambrook et al. Molecular Cloning: A Laboratory Manual (2*^^ ed.). Vols. 1-3, Cold 
Spring Harbor Laboratory, (1989) or Current Protocols in Molecular Biology, F. 
Ausubel et al, ed., Greene Publishing and Wiley-toterscience, New York (1987), each of 
which is incorporated herein by reference. Techniques for manipulation of nucleic acids 

30 encoding PUFA enzymes such as subcloning nucleic acid sequences encoding 

polypeptides into expression vectors, labelling probes, DNA hybridization, and the like 
are described generally in Sambrook, supra. 
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Oligonucleotide probes also can be used to screen sources and can be based on 
sequences of known PKS-like genes, including sequences conserved among known PKS- 
like genes, or on peptide sequences obtained from a desired purified protein. 
Oligonucleotide probes based on amino acid sequences can be degenerate to encompass 

5 the degeneracy of the genetic code, or can be biased in favor of the preferred codons of 
the source organism. Alternatively, a desired protein can be entirely sequenced and total 
synthesis of a DNA encoding that polypeptide performed. 

Once the desired DNA has been isolated, it can be sequenced by known methods. 
It is recognized in the art that such methods are subject to errors, such that multiple 

] 0 sequencing of the same region is routine and is still expected to lead to measurable rates 
of mistakes in the resulting deduced sequence, particularly in regions having repeated 
domains, extensive secondary structure, or unusual base compositions, such as regions 
with high GC base content. When discrepancies arise, resequencing can be done and can 
employ special methods. Special methods can include altering sequencing conditions by 

15 using: different temperatures; different enzymes; proteins which alter the ability of 
oligonucleotides to form higher order structures; altered nucleotides such as ITP or 
methylated dGTP; different gel compositions, for example adding formamide; different 
primers or primers located at different distances from the problem region; or different 
templates such as single stranded DNAs. Sequencing of mRNA can also be employed. 

20 For the most part, some or all of the coding sequences for the polypeptides having 

PKS-like gene activity are from a natural source. In some situations, however, it is 
desirable to modify all or a portion of the codons, for example, to enhance expression, by 
employing host preferred codons. Host preferred codons can be determined firom the 
codons of highest frequency in the proteins expressed in the largest amount in a particular 

25 host species of interest. Thus, the coding sequence for a polypeptide having PKS-like 
gene activity can be synthesized in whole or in part. All or portions of the DNA also can 
be synthesized to remove any destabilizing sequences or regions of secondary structure 
which would be present in the transcribed mRNA. All or portions of the DNA also can 
be synthesized to aher the base composition to one more preferable to the desired host 

30 cell. Methods for synthesizing sequences and bringing sequences together are well 

established in the literature. In vitro mutagenesis and selection, site-directed mutagenesis, 
or other means can be employed to obtain mutations of naturally occurring PKS-like 
genes to produce a polypeptide having PKS-like gene activity in vivo with more desirable 
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physical and kinetic parameters for function in the host cell, such as a longer half-life or a 
higher rate of production of a desired polyunsaturated fatty acid. 

Of particular interest are the Shewanella putrefaciens ORFs and the corresponding 
ORFs of Vibrio marinus. The Shewanella putrefaciens PKS-like genes can be expressed 

5 in transgenic plants to effect biosynthesis of EPA. Other DNAs which are substantially 
identical in sequence to the Shewanella putrefaciens PKS-like genes, or which encode 
polypeptides which are substantially similar to PKS-like genes of Shewanella 
putrefaciens can be used, such as those identified from Vibrio marinus. By substantially 
identical in sequence is intended an amino acid sequence or nucleic acid sequence 

10 exhibiting in order of increasing preference at least 60%, 80%, 90% or 9S% homology to 
the DNA sequence of the Shewanella putrefaciens PKS-like genes or nucleic acid 
sequences encoding the amino acid sequences for such genes. For polypeptides, the 
length of comparison sequences generally is at least 1 6 amino acids, preferably at least 20 
amino acids, and most preferably 35 amino acids. For nucleic acids, the length of 

15 comparison sequences generally is at least 50 nucleotides, preferably at least 60 
nucleotides, and more preferably at least 75 nucleotides, and most preferably, 110 
nucleotides. 

Homology typically is measured using sequence analysis software, for example, 
the Sequence Analysis software package of the Genetics Computer Group, University of 

20 Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wisconsin 53705, 
MEGAlign (DNAStar, hic, 1228 S. Park St., Madison, Wisconsin 53715), and 
MacVector (Oxford Molecular Group, 2105 S. Bascom Avenue, Suite 200, Campbell, 
California 95008). BLAST (National Center for Biotechnology Information (WCBI) 
www.ncbi.nhn.gov; FASTA (Pearson and Lipman, Science (1985) 227:1435-1446). Such 

25 software matches similar sequences by assigning degrees of homology to various 

substitutions, deletions, and other modifications. Conservative substitutions typically 
include substitutions within the following groups: glycine and alanine; valine, isoleucine 
and leucine; aspartic acid, glutamic acid, asparagine, and glutamine; serine and threonine; 
lysine and arginine; and phenylalanine and tyrosine. Substitutions may also be made on 

30 the basis of conserved hydrophobicity or hydrophilicity (Kyte and Doolittle, J, Mol Biol. 
(1982) 157: 105-132), or on the basis of the ability to assume similar polypeptide 
secondary structure (Chou and Fasman, Adv. EnzymoL (1978) 47: 45-148, 1978). A 
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related protein to the probing sequence is identified when p 2: 0.01, preferably p ^ 10 or 
10 

Encompassed by the present invention are related PKS-like genes from the same 
or other organisms. Such related PKS-like genes include variants of the disclosed PKS- 

5 like ORFs that occur naturally within the same or different species of Shewanella, as well 
as homologues of the disclosed PKS-like genes from other species and evolutionarily 
related proteins having analogous fimction and activity. Also included are PKS-like 
genes which, although not substantially identical to the Shewanella putrefaciem PKS- 
like genes, operate in a similar fashion to produce PUFAs as part of a PKS-like system. 

1 0 Related PKS-like genes can be identified by their ability to fimction substantially the 
same as the disclosed PKS-like genes; that is, they can be substituted for corresponding 
ORFs of Shewanella or Vibrio and still effectively produce EPA or DHA. Related PKS- 
like genes also can be identified by screening sequence databases for sequences 
homologous to the disclosed PKS-like genes, by hybridization of a probe based on the 

15 disclosed PKS-like genes to a library constructed from the source organism, or by RT- 
PCR using mRNA from the source organism and primers based on the disclosed PKS-like 
gene. Thus, the phrase "PKS-like genes" refers not only to the nucleotide sequences 
disclosed herein, but also to other nucleic acids that are allelic or species variants of these 
nucleotide sequences. It is also understood that these terms include nonnatural mutations 

20 introduced by deliberate mutation using recombinant technology such as single site 
mutation or by excising short sections of DNA open reading frames coding for PUFA 
enzymes or by substituting new codons or adding new codons. Such minor alterations 
substantially maintain the inmiunoidentity of the original expression product and/or its 
biological activity. The biological properties of the altered PUFA enzymes can be 

2S determined by expressing the enzymes in an appropriate cell line and by determining the 
ability of the enzymes to synthesize PUFAs. Particular enzyme modifications considered 
minor would include substitution of amino acids of similar chemical properties, e.g., 
glutamic acid for aspartic acid or glutamine for asparagine. 

When utilizing a PUFA PKS-like system from another organism, the regions of a 

30 PKS-like gene polypeptide important for PKS-like gene activity can be determined 
through routine mutagenesis, expression of the resulting mutant polypeptides and 
determination of their activities. The coding region for the mutants can include deletions, 
insertions and point mutations, or combinations thereof. A typical functional analysis 
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begins with deletion mutagenesis to determine the N- and C-terminal limits of the protein 
necessary for function^ and then internal deletions, insertions or point mutants are made in 
the open ready frame to further determine regions necessary for function. Other 
techniques such as cassette mutagenesis or total synthesis also can be used. Deletion 

5 mutagenesis is accomplished, for example, by using exonucleases to sequentially remove 
the 5' or 3' coding regions. Kits are available for such techniques. After deletion, the 
coding region is completed by ligating oligonucleotides containing start or stop codons to 
the deleted coding region after 5' or 3' deletion, respectively. Altematively, 
oligonucleotides encoding start or stop codons are inserted into the coding region by a 

10 variety of methods including site-directed mutagenesis, mutagenic PCR or by ligation 
onto DNA digested at existing restriction sites. Internal deletions can similarly be made 
through a variety of methods including the use of existing restriction sites in the DNA, by 
use of mutagenic primers via site directed mutagenesis or mutagenic PCR. Insertions are 
made through methods such as linker-scanning mutagenesis, site-directed mutagenesis or 

15 mutagenic PCR. Point mutations are made through techniques such as site-directed 
mutagenesis or mutagenic PCR. 

Chemical mutagenesis also can be used for identifying regions of a PKS-like gene 
polypeptide important for activity. A mutated construct is expressed, and the ability of 
the resuhing altered protein to function as a PKS-like gene is assayed. Such structure- 

20 function analysis can determine which regions may be deleted, which regions tolerate 

insertions, and which point mutations allow the mutant protein to function in substantially 
the same way as the native PKS-like gene. All such mutant proteins and nucleotide 
sequences encoding them are within the scope of the present invention. EPA is produced 
in Shewanella as the product of a PKS-like system, such that the EPA genes encode 

25 components of this system. In Vibrio^ DHA is produced by a similar system. The 

enzymes which synthesize these fatty acids are encoded by a cluster of genes which are 
distinct from the fatty acid synthesis genes encoding the enzymes involved in synthesis of 
the 016 and CI 8 fatty acids typically found in bacteria and in plants. As the Shewanella 
EPA genes represent a PKS-like gene cluster, EPA production is, at least to some extent, 

30 independent of the typical bacterial type n FAS system. Thus, production of EPA in the 
cytoplasm of plant cells can be achieved by expression of the PKS-like pathway genes in 
plant cells under the control of appropriate plant regulatory signals. 
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EPA production in E. coli transformed with the Shewanella EPA genes proceeds 
during anaerobic growth, indicating that 02-dependent desaturase reactions are not 
involved. Analyses of the proteins encoded by the ORFs essential for EPA production 
reveals the presence of domain structures characteristic of PKS-like systems. Fig. 2 A 

5 shows a summary of the domains, motifs, and also key homologies detected by "BLAST" 
data bank searches. Because EPA is different from many of the other substances 
produced by PKS-like pathways, i.e., it contains S, cis double bonds, spaced at 3 carbon 
intervals along the molecule, a PKS-like system for synthesis of EPA is not expected. 

Further, BLAST searches using the domains present in the Shewanella EPA ORFs 

10 reveal that several are related to proteins encoded by a PKS-like gene cluster found in 

Anabeana. The structure of that region of the Anabeana chromosome is shown in Fig. 2F. 
The Anabeana PKS-like genes have been linked to the synthesis of a long-chain (C26), 
hydroxy-fatty acid found in a glycolipid layer of heterocysts. The EPA protein domains 
with homology to the Anabeana proteins are indicated in Fig. 2F. 

15 ORF 6 of Shewanella contains a KAS domain which includes an active site motif 

(DXAC*) as well as a "GFGG" motif which is present at the end of many Type n KAS 
proteins (see Fig. 2A). Extended motifs are present but not shown here. Next is a 
malonyl-CoA:ACP acyl transferase (AT) domaui. Sequences near the active site motif 
(GHS^XG) suggest it transfers malonate rather than methyhnalonate, i.e., it resembles the 

20 acetate-like ATs. Following a linker region, there is a cluster of 6 repeating domains, 
each --100 amino acids in lengdi, which are homologous to PKS-like ACP sequences. 
Each contains a pantetheine binding site motif (LGXDS*(L/I)). The presence of 6 such 
ACP domains has not been observed previously in fatty acid synthases (FAS) or PKS-like 
systems. Near the end of the protein is a region which shows homology to B-keto-ACP 

25 reductases (KR). It contains a pyridine nucleotide binding site motif "GXGXX(G/A/P)'\ 
The Shewanella ORF 8 begins with a KAS domain, including active site and 
ending motifs (Fig. 2C). The best match in the data banks is with the Anabeana HglD. 
There is also a domain which has sequence homology to the N- terminal one half of the 
Anabeana HglC. This region also shows weak homology to KAS proteins although it 

30 lacks the active site and ending motifs. It has the characteristics of the so-called chain 
length factors (CLF) of Type n PKS-like systems. ORF 8 appears to direct the production 
of EPA versus DHA by the PKS-like system. ORF 8 also has two domains with 
homology to B-hydroxyacyl-ACP dehydrases (DH). The best match for both domains is 
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with E, coli FabA, a bi-functional enzyme which carries out both the dehydrase reaction 
and an isomerization (trans to cis) of the resulting double bond. The first DH domain 
contains both the active site histidine (H) and an adjacent cysteine (C) implicated in FabA 
catalysis. The second DH domain has the active site H but lacks the adjacent C (Fig. 2C). 

5 Blast searches with the second DH domain also show matches to FabZ, a second E. coli 
DH, which does not possess isomerase activity. 

The N-terminal half of ORF 7 (Fig. 2B) has no significant matches in the data 
banks. The best match of the C-terminai half is with a C-terminal portion of the 
Anabeana HglC. This domain contains an acyl-transferase (AT) motif (GXSXG). 

1 0 Comparison of the extended active site sequences, based on the crystal structure of the £. 
coli malonyI-CoA:ACP AT, reveals that ORF 7 lacks two residues essential for exclusion 
of water from the active site {E. coli nomenclature; Ql 1 and Rl 17). These data suggest 
that ORF 7 may function as a thioesterase, 

ORF 9 (Fig. 2D) is homologous to an ORF of unknown function in the Anabeana 

15 Hgl cluster. It also exhibits a very weak homology to NIFA, a regulatory protein m 

nitrogen fixing bacteria. A regulatory role for the ORF 9 protein has not been excluded. 
ORF 3 (Fig. 2E) is homologous to the Anabeana Hetl as well as EntD from E. coli and 
Sfp of Bacillus, Recently, a new enzyme family of phosphopantetheinyl transferases has 
been identified that includes Hett, EntD and Sfy (Lamblot RH, et al (1996) A new 

20 enzyme superfamily - the phophopantethemyl transferases. Chemistry & Biology, Vol 3, 
#1 1 , 923-936 ). The data of Fig. 3 demonstrates that the presence of ORF 3 is required 
for addition of 6-alanine (i.e. pantetheine) to the ORF 6 protein. Thus, ORF 3 encodes 
the phosphopantetheinyl transferase specific for the ORF 6 ACP domains. {See^ Haydock 
SF et al (1995) Divergent sequence motifs correlated with the substrate specificity of 

25 (methyl)malonyl-CoA:acyl carrier protein transacylase domains in modular polyketide 
synthases, FEES Lett,, 374, 246-248). Malonate is the source of the carbons utilized in 
the extension reactions of EPA synthesis. Additionally, malonyl-CoA rather than 
malonyl-ACP is the AT substrate, i.e., the AT region of ORF 6 uses malonyl Co-A. 

Once the DNA sequences encoding the PKS-like genes of an organism responsible 

30 for PUFA production have been obtained, they are placed in a vector capable of 

replication in a host cell, or propagated in vitro by means of techniques such as PGR or 
long PGR. Replicating vectors can include plasmids, phage, viruses, cosmids and the 
like. Desirable vectors include those useful for mutagenesis of the gene of interest or for 
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expression of the gene of interest in host cells. A PUFA synthesis enzyme or a 
homologous protein can be expressed in a variety of recombinantly engineered cells. 
Numerous expression systems are available for expression of DNA encoding a PUFA 
enzyme. The expression of natural or synthetic nucleic acids encoding PUFA enzyme is 

5 typically achieved by operably linking the DNA to a promoter (which is either 

constitutive or inducible) within an expression vector. By expression vector is meant a 
DNA molecule, linear or circular, that comprises a segment encoding a PUFA enzyme, 
operably linked to additional segments that provide for its transcription. Such additional 
segments include promoter and terminator sequences. An expression vector also may 

10 include one or more origins of replication, one or more selectable markers, an enhancer, a 
polyadenylation signal, etc. Expression vectors generally are derived from plasmid or 
viral DNA, and can contain elements of both. The term "operably linked" indicates that 
the segments are arranged so that they function in concert for their intended purposes, for 
example, transcription initiates in the promoter and proceeds through the coding segment 

15 to the terminator. See Sambrook et al, supra. 

The technique of long PGR has made in vitro propagation of large constructs 
possible, so that modifications to the gene of interest, such as mutagenesis or addition of 
expression signals, and propagation of the resuitmg constructs can occur entirely in vitro 
without the use of a replicating vector or a host cell. In vitro expression can be 

20 accomplished, for example, by placing the coding region for the desaturase polypeptide in 
an expression vector designed for in vitro use and adding rabbit reticulocyte lysate and 
cofactors; labeled amino acids can be incorporated if desired. Such in vitro expression 
vectors may provide some or all of the expression signals necessary in the system used. 
These methods are well known in the art and the components of the system are 

25 commercially available. The reaction mixture can then be assayed directly for PKS-like 
enzymes for example by determining their activity, or the synthesized enzyme can be 
purified and then assayed. 

Expression in a host cell can be accomplished in a transient or stable fashion. 
Transient expression can occur from introduced constructs which contain expression 

30 signals functional in the host cell, but which constructs do not replicate and rarely 

integrate in the host cell, or where the host cell is not proliferating. Transient expression 
also can be accomplished by inducing the activity of a reguiatable promoter operably 
linked to the gene of interest, although such inducible systems frequently exhibit a low 
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basal level of expression. Stable expression can be achieved by introduction of a nucleic 
acid construct that can integrate into the host genome or that autonomously replicates in 
the host cell. Stable expression of the gene of interest can be selected for through the use 
of a selectable marker located on or transfected with the expression construct, followed by 

5 selection for cells expressing the marker. When stable expression results from 

integration, integration of constructs can occur randomly within the host genome or can 
be targeted through the use of constructs containing regions of homology with the host 
genome sufi&cient to target recombination with the host locus. Where constructs are 
targeted to an endogenous locw, all or some of the transcriptional and translational 

10 regulatory regions can be provided by the endogenous locus. To achieve expression in a 
host cell, the transformed DNA is operably associated with transcriptional and 
translational initiation and termination regulatory regions that are functional in the host 
cell. 

Transcriptional and translational initiation and termination regions are derived 

15 from a variety of nonexclusive sources, including the DNA to be expressed, genes known 
or suspected to be capable of expression in the desired system, expression vectors, 
chemical synthesis The termination region can be derived from the 3' region of the gene 
from which the initiation region was obtained or from a different gene. A large number 
of termination regions are known to and have been found to be satisfactory in a variety of 

20 hosts from the same and different genera and species. The termination region usually is 
selected more as a matter of convenience rather than because of any particular property. 
When expressing more than one PKS-like ORF in the same cell, appropriate regulatory 
regions and expression methods should be used. Introduced genes can be propagated in 
the host cell through use of replicating vectors or by integration into the host genome. 

25 Where two or more genes are expressed from separate replicatmg vectors, it is desirable 
that each vector has a different means of replication. Each introduced construct, whether 
integrated or not, should have a different means of selection and should lack homology to 
the other constructs to maintain stable expression and prevent reassortment of elements 
among constructs. Judicious choices of regulatory regions, selection means and method 

30 of propagation of the introduced construct can be experimentally determmed so that all 
introduced genes are expressed at the necessary levels to provide for synthesis of the 
desired products. 
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A variety of procaryotic expression systems can be used to express PUFA enzyme. 
Expression vectors can be constructed which contain a promoter to direct transcription, a 
ribosome binding site, and a transcriptional terminator. Examples of regulatory regions 
suitable for this purpose in E, coli are the promoter and operator region of the E, coli 

5 tryptophan biosynthetic pathway as described by Yanofeky ( 1 984) J. BacterioL, 
158:1018-1024 and the leftward promoter of phage lambda (PX.) as described by 
Herskowitz and Hagen, (1980)yi««. Rev. GeneL, 14:399-445. The inclusion of selection 
markers in DNA vectors transformed in Kcoli is also useful. Examples of such markers 
include genes specifying resistance to ampicillin, tetracycline, or chloramphenicol. 

10 Vectors used for expressing foreign genes in bacterial hosts generally will contain a 

selectable marker, such as a gene for antibiotic resistance, and a promoter which functions 
in the host cell. Plasmids useful for transforming bacteria include pBR322 (Bolivar, et al, 
(1977) Gene 2:95-1 13), the pUC plasmids (Messing,(1983) Metk Enzymol. 101 :20-77, 
Vieira and Messing, (1982) Gene 19:259-268), pCQV2 (Queen, ibid.\ and derivatives 

15 thereof Plasmids may contain both viral and bacterial elements. Methods for the 
recovery of the proteins in biologically active form are discussed in U.S. Patent Nos. 
4,966,963 and 4,999,422, which are incorporated herein by reference. See Sambrook, et 
al for a description of other prokaryotic expression systems. 

For expression in eukaryotes, host cells for use in practicing the present invention 

20 include mammalian, avian, plant, msect, and fungal cells. As an example, for plants, the 
choice of a promoter will depend in part upon whether constitutive or inducible 
expression is desired and whether it is desirable to produce the PUFAs at a particular 
stage of plant development and/or in a particular tissue. Considerations for choosing a 
specific tissue and/or developmental stage for expression of the ORFs may depend on 

25 competing substrates or the ability of the host cell to tolerate expression of a particular 
PUFA. Expression can be targeted to a particular location within a host plant such as 
seed, leaves, fruits, flowers, and roots, by using specific regulatory sequences, such as 
those described in USPN 5,463,174, USPN 4,943,674, USPN 5,106,739, USPN 
5,175,095, USPN 5,420,034, USPN 5,188,958, and USPN 5,589,379. Where the host cell 

30 is a yeast, transcription and translational regions fimctional in yeast cells are provided, 
particularly from the host species. The transcriptional initiation regulatory regions can be 
obtained, for example from genes in the glycolytic pathway, such as alcohol 
dehydrogenase, glyceraldehyde-3-phosphate dehydrogenase (GPD), 
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phosphoglucoisomerase, phosphoglycerate kinase, etc. or regulatable genes such as acid 
phosphatase, lactase, metallothionein, glucoamylase, etc. Any one of a number of 
regulatory sequences can be used in a particular situation, depending upon whether 
constitutive or induced transcription is desired, the particular efficiency of the promoter in 

5 conjunction with the open-reading frame of mterest, the ability to join a strong promoter 
with a control region from a different promoter which allows for inducible transcription, 
ease of construction, and the Uke. Of particular interest are promoters which are activated 
in the presence of galactose. Galactose-inducible promoters (GALl , GAL7, and GALIO) 
have been extensively utilized for high level and regulated expression of protein in yeast 

10 (Lue et al (1987) Mol Cell Biol 7:3446; Johnston, (1987) Microbiol Rev. 5 1 :458). 
Transcription from the GAL promoters is activated by the GAL4 protein, which binds to 
the promoter region and activates transcription when galactose is present. In the absence 
of galactose, the antagonist GAL80 binds to GAL4 and prevents GAL4 from activating 
transcription. Addition of galactose prevents GAL80 from inhibiting activation by GAL4. 

15 Preferably, the termination region is derived from a yeast gene, particularly 

Saccharomyces, Schizosaccharomyces, Candida or Kluyveromyces. The 3' regions of 
two mammalian genes, y interferon and a2 interferon, are also known to function in yeast. 

Nucleotide sequences surrounding the translational initiation codon ATG have 
been found to affect expression in yeast cells. If the desired polypeptide is poorly 

20 expressed in yeast, the nucleotide sequences of exogenous genes can be modified to 
include an efiBcient yeast translation initiation sequence to obtain optimal gene 
expression. For expression in Saccharomyces^ this can be done by site-directed 
mutagenesis of an inefQciently expressed gene by frising it in-fi'ame to an endogenous 
Saccharomyces gene, preferably a highly expressed gene, such as the lactase gene. 

25 As an altemative to expressing the PKS-like genes in the plant cell cytoplasm, is 

to target the enzymes to the chloroplast. One method to target proteins to the chloroplast 
entails use of leader peptides attached to the N-termini of the proteins. Commonly used 
leader peptides are derived from the small subimit of plant ribulose bis phosphate 
carboxylase. Leader sequences from other chloroplast proteins may also be used. 

30 Another method for targeting proteins to the chloroplast is to transform the chloroplast 
genome (Stable transformation of chloroplasts of Chlamydomonas reinhardtii (1 green 
alga) using bombardment of recipient cells with high-velocity tungsten microprojectiles 
coated with foreign DNA has been described. See, for example, Blowers et al Plant Cell 
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(1989) 7:123-132 and Debuchy et al EMBO J {\9%9) 5:2803-2809. The transformation 
technique, using tungsten microprojectiles, is described by Kline et aU Nature (London) 
(1987) 327:70-73). The most common method of transforming chloroplasts involves 
using biolistic techniques, but other techniques developed for the purpose may also be 

5 used. (Methods for targeting foreign gene products into chloroplasts (Shrier et al EMBO 
J. (1985) 4:25-32) or mitochnodria (Boutry et al, supra) have been described. See also 
Tomai et al Gen, Biol Chem, (1988) 253:15104-15109 and US Patent No. 4,940,835 for 
the use of transit peptides for translocating nuclear gene products into the chloroplast. 
Methods for directing the transport of proteins to the chloroplast are reviewed in Kenauf 

10 TIBTECH(m7) 5:40-47. 

For producing PUFAs in avian species and cells, gene transfer can be performed 
by introducing a nucleic acid sequence encoding a PUFA enzyme into the cells following 
procedures known in the art. If a transgenic animal is desired, pluripotent stem cells of 
embryos can be provided with a vector carrying a PUFA enzyme encoding transgene and 

1 5 developed into adult animal (USPN 5,1 62,2 1 5 ; Ono et al, ( 1 996) Comparative 

Biochemistry and Physiology A /7i(3):287-292; WO 9612793; WO 9606160), In most 
cases, the transgene is modified to express high levels of the PKS-like enzymes in order 
to increase production of PUFAs. The transgenes can be modified, for example, by 
providing transcriptional and/or translational regulatory regions that function in avian 

20 cells, such as promoters which direct expression in particular tissues and egg parts such as 
yolk. The gene regulatory regions can be obtained from a variety of sources, including 
chicken anemia or avian leukosis viruses or avian genes such as a chicken ovalbumin 
gene. 

Production of PUFAs in insect cells can be conducted using baculovirus 
25 expression vectors harboring PKS-like transgenes. Baculovirus expression vectors are 
available from several conmiercial sources such as Clonetech. Methods for producing 
hybrid and transgenic strains of algae, such as marine algae, which contain and express a 
desaturase transgene also are provided. For example, transgenic marine algae can be 
prepared as described in USPN 5,426,040. As with the other expression systems 
30 described above, the timing, extent of expression and activity of the desaturase transgene 
can be regulated by fitting the polypeptide coding sequence with the appropriate 
transcriptional and translational regulatory regions selected for a particular use. Of 
particular interest are promoter regions which can be induced under preselected growth 
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conditions. For example, introduction of temperature sensitive and/or metabolite 
responsive mutations into the desaturase transgene coding sequences, its regulatory 
regions, and/or the genome of cells into which the transgene is introduced can be used for 
this putpose. 

5 The transformed host cell is grown under appropriate conditions adapted for a 

desired end result. For host cells grown in culture, the conditions are typically optimized 
to produce the greatest or most economical yield of PUFAs, which relates to the selected 
desaturase activity. Media conditions which may be optimized include: carbon source, 
nitrogen soiirce, addition of substrate, final concentration of added substrate, form of 

10 substrate added, aerobic or anaerobic growth, growth temperature, inducing agent, 

induction temperature, growth phase at induction, growth phase at harvest, pH, density, 
and maintenance of selection. Microorganisms such as yeast, for example, are preferably 
grown using selected media of interest, which include yeast peptone broth (YPD) and 
minimal media (contains amino acids, yeast nitrogen base, and ammonium sulfate, and 

15 lacks a component for selection, for example uracil). Desirably, substrates to be added 
are first dissolved in ethanoL Where necessary, expression of the polypeptide of interest 
may be induced, for example by including or adding galactose to induce expression from 
a GAL promoter. 

When increased expression of the PKS-like gene polypeptide in a host cell which 
20 expresses PUFA from a PKS-like system is desired, several methods can be employed. 
Additional genes encoding the PKS-like gene polypeptide can be introduced into the host 
organism. Expression from the native PKS-like gene locus also can be increased through 
homologous recombination, for example by insertmg a stronger promoter into the host 
genome to cause increased expression, by removing destabilizing sequences from either 
25 the mRNA or the encoded protein by deleting that information from the host genome, or 
by adding stabilizing sequences to the mRNA {see USPN 4,910,141 and USPN 
5,500,365). Thus, the subject host will have at least have one copy of the expression 
construct and may have two or more, depending upon whether the gene is integrated into 
the genome, amplified, or is present on an extrachromosomal element having multiple 
30 copy numbers. Where the subject host is a yeast, four principal types of yeast plasmid 
vectors can be used: Yeast Integrating plasmids (Yips), Yeast Replicating plasmids 
(YRps), Yeast Centromere plasmids (YCps), and Yeast Episomal plasmids (YEps). Yips 
lack a yeast replication origin and must be propagated as integrated elements in the yeast 
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genome. YRps have a chromosomally derived autonomously replicating sequence and 
are propagated as medium copy number (20 to 40), autonomously replicating, unstably 
segregating plasmids. YCps have both a replication origm and a centromere sequence 
and propagate as low copy number (10-20), autonomously replicating, stably segregating 
plasmids. YEps have an origin of replication from the yeast 2nin plasmid and are 
propagated as high copy number, autonomously replicating, irregularly segregating 
plasmids. The presence of the plasmids in yeast can be ensured by maintaming selection 
for a marker on the plasmid. Of particular interest are the yeast vectors pYES2 (a YEp 
plasmid available from Livitrogen, confers uracil prototrophy and a GALl galactose- 
inducible promoter for expression), and pYX424 (a YEp plasmid having a constitutive 
TPl promoter and conferring leucine prototrophy; (Alber and Kawasaki (1982). J. Mol 
&Appi Genetics 1: 419). 

The choice of a host cell is influenced in part by the desired PUFA profile of the 
transgenic cell, and the native profile of the host cell. Even where the host cell expresses 
PKS-like gene activity for one PUFA, expression of PKS-like genes of another PKS-like 
system can provide for production of a novel PUFA not produced by the host cell. In 
particular instances where expression of PKS-like gene activity is coupled with 
expression of an ORF 8 PKS-Uke gene of an organism which produces a different PUFA, 
it can be desirable that the host cell naturally have, or be mutated to have, low PKS-like 
gene activity for ORF 8. As an example, for production of EPA, the DNA sequence used 
encodes the polypeptide having PKS-like gene activity of an organism which produces 
EPA, while for production of DHA, the DNA sequences used are those from an organism 
which produces DHA. For use m a host cell which aheady expresses PKS-like gene 
activity it can be necessary to utilize an expression cassette which provides for 
overexpression of the desired PKS-like genes alone or with a construct to dowm-egulate 
the activity of an existmg ORF of the existing PKS-like system, such as by antisense or 
co-suppression. Similarly, a combination of ORFs derived from separate organisms 
which produce the same or different PUFAs using PKS-like systems may be used. For 
instance, the ORF 8 of Vibrio directs the expression of DHA in a host cell, even when 
ORFs 3, 6, 7 and 9 are from Shewanella, which produce EPA when coupled to ORF 8 of 
Shewanella. Therefore, for production of eicosapentanoic acid (EPA), the expression 
cassettes used generally include one or more cassettes which include ORFs 3, 6, 7, 8 and 
9 from a PUFA-producing organism such as the marine bacterium Shewanella 
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putrefaciens (for EPA production) or Vibrio marinus (for DHA production). ORF 8 can 
be used for induction of DHA production, and ORF 8 of Vibrio can be used in 
conjunction with ORFs 3, 6, 7 and 9 of Shewanella to produce DHA. The organization 
and numbering scheme of the ORFs identified in the Shewanella gene cluster are shown 

5 in Fig 1 A. Maps of several subclones referred to in this study are shown in Fig IB. For 
expression of a PKS-like gene polypeptide, transcriptional and translational initiation and 
termination regions functional in the host cell are operably linked to the DNA encoding 
the PKS-like gene polypeptide. 

Constructs comprismg the PKS-like ORFs of interest can be introduced into a host 

10 cell by any of a variety of standard techniques, depending in part upon the type of host 
cell These techniques include transfection, infection, holistic impact, electroporation, 
microinjection, scraping, or any other method which introduces the gene of interest into 
the host cell {see USPN 4,743,548, USPN 4,795,855, USPN 5,068,193, USPN 5,188,958, 
USPN 5,463,174, USPN 5,565,346 and USPN 5,565,347). Methods of transformation 

1 5 which are used include lithium acetate transformation {Methods in Enzymology, (1991) 
194: 186-187). For convenience, a host cell which has been manipulated by any method 
to take up a DNA sequence or construct will be referred to as "transformed" or 
"recombinant" herein. The subject host will have at least have one copy of the expression 
construct and may have two or more, depending upon whether the gene is integrated into 

20 the genome, amplified, or is present on an extrachromosomal element having multiple 
copy numbers. 

For production of PUFAs, depending upon the host cell, the several polypeptides 
produced by pEPA, ORFs 3, 6, 7, 8 and 9, are introduced as individual expression 
constructs or can be combined into two or more cassettes which are introduced 
25 individually or co-transformed into a host cell. A standard transformation protocol is 
used. For plants, where less than all PKS-like genes required for PUFA synthesis have 
been inserted into a single plant, plants containing a complementing gene or genes can be 
crossed to obtain plants containing a full complement of PKS-like genes to synthesize a 
desired PUFA. 

30 The PKS-like-mediated production of PUFAs can be performed in either 

prokaryotic or eukaryotic host cells. The cells can be cultured or formed as part or all of a 
host organism including an animal. Viruses and bacteriophage also can be used with 
appropriate cells in the production of PUFAs, particularly for gene transfer, cellular 
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targeting and selection. Any type of plant cell can be used for host ceils, including 
dicotyledonous plants, monocotyledonous plants, and cereals. Of particular interest are 
crop plants such as Brassica, Arabidopsis, soybean, com, and the like. Prokaryotic cells 
of interest include Eschericia, Baccillus, Lactobaccillus, cyanobacteria and the like. 

5 Eukaryotic cells include plant cells, mammalian cells such as those of lactatmg animals, 
avian cells such as of chickens, and other cells amenable to genetic manipulation 
including insect, fungal, and algae cells. Examples of host animals include mice, rats, 
rabbits, chickens, quail, turkeys, cattle, sheep, pigs, goats, yaks, etc., which are amenable 
to genetic manipulation and cloning for rapid expansion of a transgene expressing 

10 population. For animals, PKS-like transgenes can be adapted for expression in target 
organelles, tissues and body fluids through modification of the gene regulatory regions. 
Of particular interest is the production of PUFAs in the breast milk of the host animal. 

Examples of host microorganisms include Saccharomyces cerevisiae, 
Saccharomyces carlsbergensis, or other yeast such as Candida, Kluyveromyces or other 

15 fimgi, for example, filamentous fungi such as Aspergillus, Neurospora, Penicillium^ etc. 
Desirable characteristics of a host microorganism are, for example, that it is genetically 
well characterized, can be used for high level expression of the product using ultra-high 
density fermentation, and is on the GRAS (generally recognized as safe) list since the 
proposed end product is intended for ingestion by humans. Of particular interest is use of 

20 a yeast, more particularly baker's yeast {S. cerevisiae\ as a cell host in the subject 

invention. Strains of particular interest are SC334 (Mat a pep4-3 prbM 122 ura3-52 leu2- 
3, 1 12 regl-501 gall; (Hovland et at (1989) Gene 83:57-64); BJ1995 (Yeast Genetic 
Stock Centre, 1021 Donner Laboratory, Berkeley, CA 94720), INVSCl (Mat a hiw3Al 
leu2 trpl-289 ura3-52 (Invitrogen, 1600 Faraday Ave., Carlsbad, CA 92008) and INVSC2 

25 (Mat a his3A200ura3-l 67; (Invitrogen). Bacterial cells also may be used as hosts. This 
includes E. coli, which can be useful in fermentation processes. Alternatively, a host such 
as a Lactobacillus species can be used as a host for iatroducing the products of the PKS- 
like pathway into a product such as yogurt. 

The transformed host cell can be identified by selection for a marker contained on 

30 the introduced construct. Alternatively, a separate marker construct can be introduced 
with the desired construct, as many transformation techniques introduce multiple DNA 
molecules into host cells. Typically, transformed hosts are selected for their ability to 
grow on selective media. Selective media can incorporate an antibiotic or lack a factor 
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necessary for growth of the imtransformed host, such as a nutrient or growth factor. An 
introduced marker gene therefor may confer antibiotic resistance, or encode an essential 
growth factor or enzyme, and permit growth on selective media when expressed in the 
transformed host cell. Desirably, resistance to kanamycin and the amino glycoside G418 

5 are of particular interest {see USPN 5,034,322). For yeast transformants, any marker that 
functions in yeast can be used, such as the ability to grow on media lacking uracil, 
leucine, lysine or tiyptophan. 

Selection of a transformed host also can occiu* when the expressed marker protein 
can be detected, either directly or indirectly. The marker protein can be expressed alone 

10 or as a fusion to another protein. The marker protein can be one which is detected by its 
enzymatic activity; for example B-galactosidase can convert the substrate X-gal to a 
colored product, and luciferase can convert luciferin to a light-emitting product. The 
marker protein can be one which is detected by its light-producing or modifying 
characteristics; for example, the green fluorescent protein of Aequorea victoria fluoresces 

15 when illuminated with blue light. Antibodies can be used to detect the marker protein or 
a molecular tag on, for example, a protein of interest. Cells expressing the marker protein 
or tag can be selected, for example, visually, or by techniques such as FACS or panning 
using antibodies. 

The PUT As produced using the subject methods and compositions are found in 
20 the host plant tissue and/or plant part as free fatty acids and/or in conjugated forms such 
as acylglycerols, phospholipids, sulfolipids or glycolipids, and can be extracted from the 
host cell through a variety of means well-known in the art. Such means include extraction 
with organic solvents, sonication, supercritical fluid extraction using for example carbon 
dioxide, and physical means such as presses, or combinations thereof. Of particular 
25 interest is extraction with methanol and chloroform. Where appropriate, the aqueous 
layer can be acidified to protonate negatively charged moieties and thereby increase 
partitioning of desired products into the organic layer. After extraction, the organic 
solvents can be removed by evaporation under a stream of nitrogen. When isolated m 
conjugated forms, the products are enzymatically or chemically cleaved to release the free 
30 fatty acid or a less complex conjugate of interest, and are then subjected to further 
manipulations to produce a desired end product. Desirably, conjugated forms of fatty 
acids are cleaved with potassium hydroxide. 
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If further purification is necessary, standard methods can be employed. Such 
methods include extraction, treatment with urea, fractional crystallization, HPLC, 
fractional distillation, silica gel chromatography, high speed centrifrigation or distillation, 
or combinations of these techniques. Protection of reactive groups, such as the acid or 
5 alkenyl groups, can be done at any step through known techniques, for example alkylation 
or iodination. Methods used include methylation of the fatty acids to produce methyl 
esters. Similarly, protecting groups can be removed at any step. Desirably, purification 
of fractions containing DHA and EPA is accomplished by treatment with urea and/or 
fractional distillation. 

10 The uses of the subject invention are several. Probes based on the DNAs of the 

present invention fmd use in methods for isolating related molecules or in methods to 
detect organisms expressmg PKS-like genes. When used as probes, the DNAs or 
oligonucleotides need to be detectable. This is usually accomplished by attaching a label 
either at an internal site, for example via incorporation of a modified residue, or at the 5' 

15 or 3' terminus. Such labels can be directly detectable, can bind to a secondary molecule 
that is detectably labeled, or can bind to an unlabelled secondary molecule and a 
detectably labeled tertiary molecule; this process can be extended as long as is practicable 
to achieve a satisfactorily detectable signal without unacceptable levels of background 
signal. Secondary, tertiary, or bridging systems can include use of antibodies directed 

20 against any other molecule, including labels or other antibodies, or can involve any 
molecules which bind to each other, for example a biotin-streptavidin/avidin system. 
Detectable labels typically include radioactive isotopes, molecules which chemically or 
enzymatically produce or alter light, enzymes which produce detectable reaction products, 
magnetic molecules, fluorescent molecules or molecules whose fluorescence or light- 

25 emitting characteristics change upon binding. Examples of labelling methods can be 
found in USPN 5,01 1 ,770. Alternatively, the binding of target molecules can be directly 
detected by measuring the change in heat of solution on binding of a probe to a target via 
isothermal titration calorimetry, or by coating the probe or target on a surface and 
detecting the change in scattering of light from the surface produced by binding of a target 

30 or a probe, respectively, is done with the BIAcore system. 

PUFAs produced by recombinant means find applications in a wide variety of 
areas. Supplementation of humans or animals with PUFAs in various forms can resuh in 
increased levels not only of the added PUFAs, but of their metabolic progeny as well. 
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Complex regulatory mechanisms can make it desirable to combine various PUFAs, or to 
add different conjugates of PUFAs, in order to prevent, control or overcome such 
mechanisms to achieve the desired levels of specific PUFAs in an individual. In the 
present case, expression of PKS-like gene genes, or antisense PKS-like gene transcripts, 

5 can alter the levels of specific PUFAs, or derivatives thereof, found in plant parts and/or 
plant tissues. The PKS-like gene polypeptide coding region is expressed either by itself 
or with other genes, in order to produce tissues and/or plant parts containing higher 
proportions of desired PUFAs or containing a PUFA composition which more closely 
resembles that of human breast milk (Prieto et al^ PCT publication WO 95/24494) than 

1 0 does the unmodified tissues and/or plant parts. 

PUFAs, or derivatives thereof, made by the disclosed method can be used as 
dietary supplements for patients undergoing intravenous feeding or for preventing or 
treating malnutrition. For dietary supplementation, the purified PUFAs, or derivatives 
thereof, can be incorporated into cooking oils, fats or margarines formulated so that in 

1 5 normal use the recipient receives a desired amount of PUFA. The PUFAs also can be 
incorporated into infant formulas, nutritional supplements or other food products, and 
find use as anti-inflammatory or cholesterol lowering agents. 

Particular fatty acids such as EPA can be used to alter the composition of infant 
formulas to better replicate the PUFA composition of human breast milk. The 

20 predominant triglyceride in human milk is reported to be 1 ,3-di-oleoyl-2-palmitoyl, with 
2-palmitoyl glycerides reported as better absorbed than 2-oleoyl or 2-lineoyl glycerides 
(see USPN 4,876,107). Typically, human breast milk has a fatty acid profile comprising 
from about 0.15 % to about 0.36 % as DHA, from about 0.03 % to about 0,13 % as EPA, 
from about 0.30 % to about 0.88 % as ARA, from about 0.22 % to about 0.67 % as 

25 DGLA, and from about 0.27 % to about 1 .04 % as GLA. A preferred ratio of 

GLA:DGLA:ARA in infant formulas is from about 1 : 1 :4 to about 1:1:1, respectively. 
Amounts of oils providing these ratios of PUFA can be determined without undue 
experimentation by one of skill in the art. PUFAs, or host cells containing them, also can 
be used as animal food supplements to alter an animal's tissue or milk fatty acid 

30 composition to one more desirable for human or animal consumption. 

For pharmaceutical use (human or veterinary), the compositions generally are 
administered orally but can be administered by any route by which they may be 
successfiilly absorbed, e.g., parenterally (i.e. subcutaneously, intramuscularly or 
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intravenously), rectally or vaginally or topically, for example, as a skin ointment or lotion. 
Where available, gelatin capsules are the preferred form of oral administration. Dietary 
supplementation as set forth above also can provide an oral route of administration. The 
unsaturated acids of the present invention can be administered in conjugated forms, or as 
salts, esters, amides or prodrugs of the &tty acids. Any pharmaceutically acceptable salt 
is encompassed by the present invention; especially preferred are the sodium, potassium 
or lithiiun salts. Also encompassed are the N-alkylpol34iydroxamine sahs, such as N- 
methyl glucamine, described in PCX publication WO 96/33155. Preferred esters are the 
ethyl esters. 

The PUFAs of the present invention can be administered alone or in combination 
with a pharmaceutically acceptable carrier or excipient. As solid salts, the PUFAs can 
also be administered in tablet form. For intravenous administration, the PUFAs or 
derivatives thereof can be incorporated into commercial formulations such as Intralipids. 
Where desired, the individual components of formulations can be individually provided in 
kit form, for single or multiple use. A typical dosage of a particular fatty acid is from 0.1 
mg to 20 g, or even 1 00 g daily, and is preferably from 10 mg to 1 , 2, 5 or 1 0 g daily as 
required, or molar equivalent amounts of derivative forms thereof. Parenteral nutrition 
compositions comprising from about 2 to about 30 weight percent fatty acids calculated 
as triglycerides are encompassed by the present invention. Other vitamins, and 
particularly fat-soluble vitamins such as vitamin A, D, E and L-camitine optionally can be 
included. Where desired, a preservative such as a tocopherol can be added, typically at 
about 0. 1 % by weight. 

The following examples are presented by way of illustration, hot of limitation. 

EXAMPLES 

Example 1 

The Identity of ORFs Derived from Vibrio marinus 

Using polymerase chain reaction (PGR) with primers based on ORF 6 of 
Shewanella (Sp ORF 6) sequences (FW 5' primers CUACUACUACUACCAAGCT 

AAAGCACTTAACCGTG, and CUACUACUACUAACAGCGAAATGCTTATCAAG 
for Vibrio and SS9 respectively and 3* BW primers: CAUCAUCAUCAUGCGACC 
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AAAACCAAATGAGCTAATAC for both Vibrio and SS9) and genomic DNAs 
templates from Vibrio and a borophylHc photobacter producing EPA (provided by Dr. 
Bartlett, UC San Diego), resulted in PCR products of cfl.400 bases for Vibrio marinus 
{Vibrio) and cfl.900 bases for SS9 presenting more than 75% homology with 
corresponding fragments of Sp ORF 6 {see Figure 25) as determined by dkect counting of 
homologous amino acids. 

A Vibrio cosmid library was then prepared and using the Vibrio ORF 6 PCR 
product as a probe {see Figure 26); clones containing at least ORF 6 were selected by 
colony hybridization. 

Through additional sequences of the selected cosmids such as cosmid #9 and 
cosmid #21, a Vibrio cluster (Figure 5) with ORFs homologous to, and organized in the 
same sequential order (ORFs 6-9) as ORFs 6-9 of Shewanella, was obtained (Figure 7). 
The Vibrio ORFs from this sequence are found at 17394 to 36 11 5 and comprehend ORFs 
6-9. 

Table 
Vibrio operon figures 


17394 to 25349 
25509 to 28157 
28209 to 34262 
34454 to 36115 


length = 7956 nt 
length = 2649 nt 
length = 6054 nt 
length = 1662 nt 


The ORF designations for the Shewanella genes are based on those disclosed in Figure 4, 
and differ from those published for the Shewanella cluster (Yazawa et al, USPN 
5,683,898). For instance, ORF 3 of Figure 4 is read in the opposite direction from the 
other ORFs and is not disclosed in Yazawa et al USPN 5,683,898 (See Fig. 24) for 
comparison with Yazawa et al USPN 5,683,898). 

Sequences homologous to ORF 3, were not found in the proximity of ORF 6 
(17000 bases upstream of ORF 6) or of ORF 9 (cfl.4000 bases downstream of ORF 9). 
Motifs characteristic of phosphopantethenyl transferases (Lambalot et al (1996) Current 
Biology 3:923-936) were absent from the Vibrio sequences screened for these motifs. In 
addition, there was no match to Sp ORF 3 derived probes in genomic digests of Vibrio 
and of SC2A Shewanella (another bacterium provided by the University of San Diego and 
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also capable of producing EPA). Although ORF 3 may exist in Vibrio, its DNA may not 
be homologous to that of Sp ORF 3 and/or could be located in portions of the genome that 
were not sequenced. 

Figure 6 provides the sequence of an approximately 19 kb Vibrio clone 
5 comprising ORFs 6-9. Figures 7 and 8 compare the gene cluster organizations of the 
PKS-like systems of Vibrio marinus and Shewanella putrefacians. Figures 9 through 12 
show the levels of sequence homology between the corresponding ORFs 6, 7, 8 and 9, 
respectively. 

Example 2 

10 ORF 8 Directs DHA Production 

As described in example 1, DNA homologous to Sp ORF 6 was found in an 
unrelated species, SS9 Photobacter, which also is capable of producing EPA. 
Additionally, ORFs homologous to Sp ORF 6-9 were found in the DHA producing Vbrio 
marinus {Vibrio). From these ORFs a series of experiments was designed in which 

15 deletions in each oiSp ORFs 6-9 that suppressed EPA synthesis in E. coli (Yazawa 

(1996) supra) were complemented by the conesponding homologous genes from Vibrio. 

The Sp EPA cluster was used to determine if any of the Vibrio ORFs 6-9 was 
responsible for the production of DHA. Deletion mutants provided for each of the Sp 
ORFs are EPA and DHA null. Each deletion was then complemented by the 

20 corresponding Vibrio ORF expressed behind a lac promoter (Figure 13). 

The complementation of a Sp ORF 6 deletion by a Vibrio ORF 6 reestablished the 
production of EPA. Similar results were obtained by complementing the Sp ORF 7 and 
ORF 9 deletions. By contrast, the complementation of a Sp ORF 8 deletion resulted in the 
production of C22:6. Vibrio ORF 8 therefore appears to be a key element in the synthesis 

25 of DHA. Figures 14 and 1 5 show chromatograms of fatty acid profiles from the 

respective complementations of Sp del ORF 6 with Vibrio ORF 6 (EPA and no DHA) and 
Sp del ORF 8 with Vibrio ORF 8 (DHA). Figure 16 shows the fatty acid percentages for 
the ORF 8 complementation, again demonstrating that ORF 8 is responsible for DHA 
production, 

30 These data show that polyketide-like synthesis genes with related or similar ORFs 

can be combined and expressed in a heterologous system and used to produce a distinct 
PUFA species in the host system, and that ORF 8 has a role in determining the ultimate 
chain length. The Vibrio ORFs 6, 7, 8, and 9 reestablish EPA synthesis. In the case of 
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Vibrio ORF 8, DHA is also present (ca. 0.7%) along with EPA {ca, 0.6%) indicating that 
this gene plays a significant role in directing synthesis of DHA vs EPA for these systems. 


Example 3 

5 Requirements for Production of DHA 

To determine how Vibrio ORFs of the cluster ORF 6-9 are used in combination 
with Vibrio ORF 8, some combinations of Vibrio ORF 8 with some or all of the other 
Vibrio ORFS 6-9 cluster were created to explain the synthesis of DHA. 

Vibrio ORFs 6-9 were complemented with Sp ORF 3. The results of this 
10 complementation are presented in Figures 16b and 16c. The significant amounts of DHA 
measured (greater than about 9%) and the absence of EPA suggest that no ORFs other 
than those of Vibrio ORFs 6-9 are required for DHA synthesis when combined with Sp 
ORF 3. This suggests that Sp ORF 3 plays a general fimction in the synthesis of bacterial 
PUFAs. 

15 With respect to the DHA vs EPA production, it may be necessary to combine 

Vibrio ORF 8 with other Vibrio ORFs of the 6-9 cluster in order to specifically produce 
DHA. The roles of Vibrio ORF 9 and each of the combinations of Vibrio ORFs (6,8), (7, 
8), (8, 9), etc in the synthesis of DHA are being studied. 

20 Example 4 

Plant Expression Constructs 

A cloning vector with very few restriction sites was designed to facilitate the 
cloning of large fragments and their subsequent manipulation. An adapter was assembled 
by annealing oligonucleotides with the sequences AAGCCCGGGCTT and 

25 GTACAAGCCCGGGCTTAGCT. This adapter was ligated to the vector pBluescript D 
SK+ (Stratagene) after digestion of the vector with the restriction endonucleases Aspl 1 8 
and Sstl. The resulting vector, pCGN7769 had a single Srfi (and embedded SmaJ) cloning 
site for the cloning of blunt ended DNA firagments. 

A plasmid containing the napin cassette from pCGN3223, (USPN 5,639,790) was 

30 modified to make it more usefiil for cloning large DNA firagments containing multiple 
restriction sites, and to allow the cloning of multiple napin fiision genes into plant binary 
transformation vectors. An adapter comprised of the self annealed oligonucleotide of 
sequence CGCGATTTAAATGGCGCGCCCTGCAGGCGGCCGCCTGCAGGGCGC 
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GCCATTTAAAT was ligated into the vector pBC SK+ (Stratagene) after digestion of the 
vector with the restriction endonuclease BssHU to construct vector pCGN7765. Plamids 
pCGN3223 and pCGN7765 were digested with Notl and ligated together. The resultant 
vector, pCGN7770 (Figure 17), contains the pCGN7765 backbone and the napin seed 
5 specific expression cassette from pCGN3223 . 

Shewanella constructs 

Genes encoding the Shewanella proteins were mutagenized to introduce suitable 
cloning sites 5* and 3' ORFs using PGR. The template for the PGR reactions was DNA of 

1 0 the cosmid pEPA (Y azawa et al supra). PGR reactions were performed using Pfix DNA 
polymerase according to the manufacturers' protocols. The PGR products were cloned 
into SrfL digested pCGN7769. The primers CTGCAGCTCGAGACAATGTTGATT 
TCCTTATACTTCTGTCC and GGATCCAGATCTCTAGCTAGTCTTAGCTGAAGC 
TCGA were used to amplify ORF 3, and to generate plasmid pCGN8520. The primers 

1 5 TCTAGACTCGAGACAATGAGCC AGACCTCTAAACCTACA and CCCGGGCTC 
GAGCTAATTCGCCTCACTGTCGTTTGCT were used to amplify ORF 6, and generate 
plasmid pCGN7776. The primers GAATTCCTCGAGACAATGCCGCTGCGCATCG 
CACTTATC and GGTACCAGATCTTTAGACTTCCCCTTGAAGTAAATGG were 
used to amplify ORF 7, and generate plasmid pCGN7771. The primers GAATTCGTCG 

20 ACACAATGTCATTACCAGACAATGCTTCT and TCTAGAGTCGACTTATAC 
AGATTCTTCGATGCTGATAG were used to ampUfy ORF 8, and generate plasmid 
pCGN7775. The primers GAATTCGTCGACACAATGAATCCTACAGCAA 
CTAACGAA and TCTAGAGGATCCTTAGGCCATTCTTTGGTTTGGCTTC were 
used to amplify ORF 9, and generate plasmid pCGN7773. 

23 The integrity of the PGR products was verified by DNA sequencing of the inserts 

of pCGN7771 , PCGN8520, and pCGN7773. ORF 6 and ORF 8 were quite large in size. 
In order to avoid sequencing the entire clones, the center portions of the ORFs were 
replaced with restriction fragments of pEPA. The 6.6 kilobase PacVBaniHl fragment of 
pEPA containing the central portion of ORF 6 was ligated into PacVBamHl digested 

30 pCGN7776 to yield pCGN7776B4. The 4.4 kilobase BaniHUBgm fragment of pEPA 
containing the central portion of ORF 8 was ligated into BamHUBglTL digested 
pCGN7775 to yield pCGN7775A, The regions flanking the pEPA fragment and the 
cloning junctions were verified by DNA sequencing. 
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Plasmid pCGN7771 was cut with ATjoI and Bgia and ligated to pCGN7770 after 
digestion with Sail and Bglll. The resultant napin/ORF 7 gene fusion plasmid was 
designated pCGN7783. Plasmid pCGN8520 was cut with Xhol and BglQ and ligated to 
pCGN7770 after digestion with Sail and BglR. The resultant napin/ORF 3 gene fiision 
plasmid was designated pCGN8S28. Plasmid pCGN7773 was cut with Sail and BamlO. 
and ligated to pCGN7770 after digestion with Sail and BglR, The resultant napin/ORF 9 
gene fusion plasmid was designated pCGN7785. Plasmid pCGN7775A was cut with Sail 
and ligated to pCGN7770 after digestion with Sail. The resultant napin/ORF 8 gene 
fiision plasmid was designated pCGN7782. Plasmid pCGN7776B4 was cut with ATroI 
and ligated to pCGN7770 after digestion with Sail. The resultant napin/ORF 6 gene 
fiision plasmid was designated pCGN7786B4. 

A binary vector for plant transft)rmation, pCGN5 1 39, was constructed from 
pCGN1558 (McBride and Summerfelt (1990) Plant Molecular Biology, 14:269-276). 
The polylinker of pCGN1558 was replaced as a HirtdTWAspllS fragment with a 
polylinker containing unique restriction endonuclease sites, Ascl, Pad, Xbal, Swal, 
BarnHl, ondNotl. The AspllS and HindUi restriction endonuclease sites are retained in 
pCGNS 139. PCGN5 1 39 was digested with Natl and ligated with Notl digested 
pCGN7786B4. The resultant binary vector containing the napin/ORF 6 gene fiision was 
designated pCGN8533. Plasmid pCGN8533 was digested with 5^^83871 and ligated with 
SseSSSll digested pCGN7782. The resultant binary vector containing the napin/ORF 6 
gene fiision and the napin/ORF 8 gene fiision was designated pCGN8S35 (Figure 18). 

The plant binary transformation vector, pCGN5 1 39, was digested with Aspl 1 8 
and ligated with Aspl 1 8 digested pCGN8528. The resultant binary vector containing the 
napin/ORF 3 gene fiision was designated pCGN8532. Plasmid pCGN8532 was digested 
with Notl and ligated with Notl digested pCGN7783. The resultant binary vector 
containing the napin/ORF 3 gene fiision and the napin/ORF 7 gene fiision was designated 
pCGN8534. Plasmid pCGN8534 was digested with Ssei3i7l and ligated with &^8387I 
digested pCGN7785. The resultant binary vector containing the napin/ORF 3 gene 
fiision, the napin/ORF 7 gene fiision and the napin/ORF 9 gene fiision was designated 
pCGN8537 (Figure 19). 
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Vibrio constructs 

TTie Vibrio ORFs for plant expression were all obtained using Vibrio cosmid #9 as 
a starting molecule. Vibrio cosmid #9 was one of the cosmids isolated from the Vibrio 
cosmid library using the Vibrio ORF 6 PCR product described m Example 1 , 

5 A gene encoding Vibrio ORF 7 (Figure 6) was mutagenized to introduce a Sail 

site upstream of the open reading frame and BamHl site downstream of the open reading 
frame using the PCR primers: TCTAGAGTCGACACAATGGCGGAATTAGCTG 
TTATTGGT and GTCGACGGATCCCTATTTGTTCGTGTTTGCTATATG. A gene 
encoding Vibrio ORF 9 (Figure 6) was mutagenized to introduce a BamHL site upstream 

10 of the open reading frame and an XholJl site downstream of the open reading fi^me using 
the PCR primers: GTCGACGGATCCACAATGAATATAGTAAGTAATCATTCGGCA 
and GTCGACCTCGAGTTAATCACTCGTACGATAACTTGCC. The restriction sites 
were introduced using PCR, and the integrity of the mutagenized plasmids was verified 
by DNA sequence. The Vibrio ORF 7 gene was cloned as a SaH-BamUi fragment into the 

15 napin cassette of Sal-BgH digested pCGN7770 (Figure 17) to yield pCGN8539. The 
Vibrio ORF 9 gene was cloned as a Sall-BamHl fragment into the napin cassette of Sal- 
Baa digested pCGN7770 (Figure 1 7) to yield pCGN8543. 

Genes encoding the Vibrio ORF 6 and ORF 8 were mutagenized to introduce Sail 
sites flanking the open reading frames. The Sail sites flanking ORF 6 were introduced 

20 using PCR. The primers used were: CCCGGGTCGACACAATGGCTAAAAAGAACA 
CCACATCGA and CCCGGGTCGACTCATGACATATCGTTCAAAATGTCACTGA. 
The central 7.3 kb BamHi-Xhol fragment of the PCR product was replaced with the 
corresponding fragment from Vibrio cosmid #9. The mutagenized ORF 6 were cloned 
into the SaR site of the napin cassette of pCGN7770 to yield plasmid pCGN8554. 

25 The mutagenesis of ORF 8 used a different strategy. A BamUi fragment 

containing ORF 8 was subcloned into plasmid pHC79 to yield cosmid #9*^ A SaR site 
upstream of the coding region was introduced on and adapter comprised of the 
oligonucleotides TCGACATGGAAAATATTGCAGTAGTAGGTATTGCTAATTT 
GTTC and CCGGGAACAAATTAGCAATACCTACTACTGCAATATTTTCCATG. 

30 The adapter was ligated to cosmid #9" after digestion with SaR and Xmal. A SaR site was 
mtroduced downstream of the stop codon by using PCR for mutagenesis. A DNA 
fragment containing the stop codon was generated using cosmid #9" as a template with 
the primers TCAGATGAACTTTATCGATAC and TCATGAGACGTCGTCGACTTA 
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CGCTTCAACAATACT. The PGR product was digested with the restriction 
endonucleases Cldl md AatU and was cloned into the cosmid 9" derivative digested with 
the same enzymes to yield plasmid 8P3. The Sail fragment from 8P3 was cloned into Sail 
digested pCGN7770 to yield pCGN8515. 

5 PCGN8532, a binary plant transformation vector that contains a Shewannella 

ORF 3 under control of the napin promoter was digested with Notl, and a Notl fragment 
of pCGN8539 containing a napin Vibrio ORF 7 gene fiision was inserted to yield 
pCGN8552. Plasmid pCGN8556 (Figure 23), v^ch contains Shewannella ORF 3, and 
Vibrio ORFs 7 and 9 under control of the napin promoter was constructed by cloning the 

10 &e8357 fragment from pCGN8543 into 5^^8387 digested pCGN8552. 

The Notl digested napin/ORF 8 gene from plasmid pCGN851S was cloned into a 
Notl digested plant binary transformation vector pCGN5 139 to yield pCGN8548. The 
iSse8387 digested napin/ORF 6 gene from pCGN8554 was subsequently cloned into the 
55*^8387 site of pCGN8566. The resultant binary vector containing the napin/ORF 6 gene 

1 5 fusion and napin/ORF 8 gene fusion was designated pCGN8560 (Figure 22). 

Example 5 

Plant Transformation and PUFA Production 

EPA production 

20 The Shewanella constructs pCGN8535 and pCGN8537 can be transformed into 

the same or separate plants. If separate plants are used, the transgenic plants can be 
crossed resulting in heterozygous seed which contains both constructs. 

pCGN8535 and pCGN8537 are separately transformed into Brassica napus. 
Plants are selected on media containing kanamycin and transformation by full length 

25 inserts of the constructs is verified by Southern analysis. Immature seeds also can be 
tested for protein expression of the enzyme encoded by ORFs 3, 6, 7, 8, or 9 using 
western analysis, in which case, the best expressing pCGNE8535 and pCGN8537 Ti 
transformed plants are chosen and are grown out for frirther experimentation and crossing. 
Alternatively, the Ti transformed plants showing insertion by Southem are crossed to one 

30 another producing T2 seed which has both insertions. In this seed, half seeds may be 

analyzed directly from expression of EPA in the fatty acid fraction. Remaining half-seed 
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of events with the best EPA production are grown out and developed through 
conventional breeding techniques to provide Brassica lines for production of EPA. 

Plasmids pCGN7792 and pCGN7795 also are simultaneously introduced into 
Brassica napus host cells. A standard transformation protocol is used {see for example 
5 USPN 5,463,174 and USPN 5,750,871, hov/evGT Agrobacteria containing both plasmids 
are mixed together and incubated with Brassica cotyledons during the cocultivation step. 
Many of the resultant plants are transformed with both plasmids. 

DHA production 

1 0 A plant is transformed for production of DHA by introducing pCGN85 56 and 

pCGN8S60, either into separate plants or simultaneously into the same plants as described 
for EPA production. 

Alternatively, the Shewanella ORFs can be used in a concerted fashion with ORFs 
6 and 8 of Vibrio, such as by transforming with a plant the constructs pCGN8560 and 
1 5 pCGN7795, allowing expression of the corresponding ORFs in a plant cell. This 
combination provides a PKS-Iike gene arrangement comprising ORFs 3, 7 and 9 of 
Shewanella, with an ORF 6 derived from Vibrio and also an OFR 8 derived from Vibrio, 
As described above, ORF 8 is the PKS-like gene which controls the identity of the fmal 
PUFA product. Thus, the resulting transformed plants produce DHA in plant oil. 

20 

Example 6 

Transgenic plants containing the Shewanella PUFA genes 

Brassica plants 

Fifty-two plants cotransformed with plasmids pCGN8535 andpCGN8537 were 
25 analyzed using PGR to determine if the Shewanella ORFs were present in the transgenic 
plants. Forty-one plants contained plasmid pCGN8537, and thirty-five plants contained 
pCGN8535. 1 1 of the plants contained all five ORFs required for the synthesis of EPA. 
Several plants contained genes from both of the binary plasmids but appeared to be 
missing at least one of the ORFs. Analysis is currently being performed on approximately 
30 twenty additional plants. 

Twenty-three plants transformed with pCGN8535 alone were analyzed using PGR 
to determine if the Shewanella ORFs were present in the transgenic plants. Thirteen of 
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these plants contained both Shex^anella ORF 6 and Shewanella ORF 8. Six of the plants 
contained only one ORF. 

Nineteen plants transformed with pCGN8537 were alone analyzed using PCR to 
determine if the Shewanella ORFs were present in the transgenic plants. Eighteen of the 
5 plants contained Shewanella ORF 3, Shewanella ORF 7, and Shewanella ORF 9. One 
plant contained Shewanella ORFs 3 and 7. 
Arabidopsis 

More than 40 transgenic Arabidopsis plants cotransformed with plasmids 
pCGN8535 and pCGN8537 are growing in our growth chambers. PCR analysis to 
10 determine which of the ORFs are present in the plants is currently underway. 

By the present invention PKS-like genes from various organisms can now be used 
to transform plant cells and modify the fatty acid compositions of plant cell membranes or 
plant seed oils through the biosynthesis of PUFAs in the transformed plant cells. Due to 

15 the nature of the PKS-like systems, fatty acid end-products produced in the plant cells can 
be selected or designed to contain a number of specific chemical structures. For example, 
the fatty acids can comprise the following variants: Variations in the nimibers of keto or 
hydroxyl groups at various positions along the carbon chain; variations in the numbers 
and types (cis or trans) of double bonds; variations in the numbers and types of branches 

20 off of the linear carbon chain (methyl, ethyl, or longer branched moieties); and variations 
in saturated carbons. In addition, the particular length of the end-product fatty acid can be 
controlled by the particular PKS-like genes utilized. 

All publications and patent applications mentioned in this specification are 
25 indicative of the level of skill of those skilled in the art to which this invention pertains. 
All publications and patent applications are herein incorporated by reference to the same 
extent as if each individual publication or patent application was specifically and 
individually indicated to be incorporated by reference. 

30 The invention now being fully described, it will be apparent to one of ordinary 

skill in the art that many changes and modifications can be made thereto without 
departing from the spirit or scope of the appended claims. 
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What is claimed is: 

1 . An isolated nucleic acid comprising: 

a Vibrio marimis nucleotide sequence selected from the group consisting of the 
ORF 6, ORF 7, ORF 8 and ORF 9 as shown in Figure 6. 

5 

2. An isolated nucleic acid comprising: 

a nucleotide sequence which encodes a polypeptide of a polyketide-like synthesis system, 
wherein said system produces a docosahexenoic acid when expressed in a host cell. 

10 3. The isolated nucleic acid according to Claim 2, wherein said nucleotide 

sequence is derived &om a marine bacterium. 

4. The isolated nucleic acid according to Claim 2, wherein said nucleotide 
sequence is a Vibrio marinus ORF 8 as shown in Figure 6. 


15 


20 


5. An isolated nucleic acid comprising: 

a nucleotide sequence which is substantially identical to a sequence of at least 50 
nucleotides of a Vibrio marinus nucleotide sequence selected from the group consisting of 
ORF 6, ORF 7, ORF 8 and ORF 9 as shown m Figure 6. 

6. A recombinant microbial cell comprising at least one copy of an isolated 
nucleic acid according to Claim 1 or Claim 2. 


7. The recombinant microbial cell according to Claim 6, wherein said ceil 

25 comprises each element of a polyketide-like synthesis system required to produce a long 
chain polyunsaturated fatty acid. 

8. The recombinant microbial cell according to Claim 7, wherein said cell is a 
eukaryotic cell. 

30 

9. The recombinant microbial cell according to Claim 8, wherein said eukaryotic 
cell is a fungal cell, an algae cell or an animal cell. 
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1 0. The recombinant microbial cell according to Claim 9, wherein said fimgal cell 
is a yeast cell and said algae cell is a marine algae cell. 

1 1 . The recombinant microbial cell according to Claim 6, wherein said cell is a 
5 prokaryotic cell. 

12. The recombinant microbial cell according to Claim 1 1 , wherein said cell is a 
bacterial cell or a cyanobacterial cell. 

10 13. The microbial cell according to Claim 6, wherein said recombinant microbial 

cell is enriched for 22:6 fatty acids as compared to a non-recombinant microbial cell 
which is devoid of said isolated nucleic acid. 

14. A method for production of docosahexenoic acid in a microbial cell culture, 
1 5 said method comprising: 

growing a microbial cell culture having a plurality of microbial cells, wherein said 
microbial cells or ancestors of said microbial cells were transformed with a vector 
comprising one or more nucleic acids having a nucleotide sequence which encodes a 
polypeptide of a polyketide synthesizing system, wherein said one or more nucleic acids 
20 are operabiy linked to a promoter, under conditions whereby said one or more nucleic 
acids are expressed and docosahexenoic acid is produced in said microbial cell culture. 

15. A method for production of a long chain polyunsaturated fatty acid in a plant 
cell, said method comprising: 

25 growing a plant having a plurality of plant cells, wherein said plant cells or 

ancestors of said plant cells were transformed with a vector comprising one or more 
nucleic acids having a nucleotide sequence which encodes one or more polypeptides of a 
polyketide synthesizing system which produces a long chain polyunsaturated fatty acid, 
wherein each of said nucleic acids are operabiy linked to a promoter functional in a plant 

30 cell, under conditions whereby said polypeptides are expressed and a long cham 
polyunsaturated fatty acid is produced in said plant cells. 
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16. The method according to Claim 15, wherein said long chain polyunsaturated 
fatty acid produced in said plant cells is a 20:5 and 22:6 fatty acid. 


17. The method according to Claim 15, wherein said nucleic acids comprise 
5 nucleotide sequences encoding any one of the polypeptides selected from the group 

consisting of Vibrio marinus ORF 6, ORF 7, ORF 8 and ORF 9 as shown m Figure 6 and 
Shewanella putrefaciem ORF 3, ORF 6, ORF 7, ORF 8 and ORF 9 as shown in Figure 4, 


1 8. The method according to Claim 1 5, wherein said nucleic acid constructs are derived 
10 from two or more polyketide synthesizing systems. 


19. A recombinant plant cell which produces an long chain polyunsaturated fatty acid 
exogenous to said plant cell, wherein said recombinant plant cell is produced according to a method 
comprising: 

15 transforming a plant cell or an ancestor or said plant cell with a vector comprising 

one or more nucleic acids having a nucleotide sequence which encodes one or more 
polypeptides of a polyketide synthesizing system which produces a long chain 
polyunsaturated fatty acid, wherein each of said nucleic acids are operably linked to a 
promoter functional in said plant cell whereby a recombinant plant cell is obtained; and 

20 growing said recombinant plant cell under conditions whereby said polypeptides 

are expressed and a long chain polyunsaturated fatty acid is produced in said plant cell. 

20. The recombinant plant cell according to Claim 19, wherein said recombinant plant cell 
is a recombinant seed cell. 

25 

21. The recombinant plant cell according to Claim 20, wherein said recombinant seed cell is 
a recombinant embryo cell. 

22. The method according to Claim 15, wherein said long chain polyunsaturated fatty acid 
30 produced in said plant cells is eicosapentenoic acid. 

23. The method according to Claim 15, wherein said long chain polyunsaturated fatty acid 
produced in said plant cells is docosahexenoic acid. 
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24. The recombinant plant cell according to Claim 19, wherein said recombinant plant cell 
is from a plant selected from the group consisting of Brassica, soybean, safflower, and sunflower. 

5 25. A plant oil produced by a recombinant plant cell according to Claim 19, wherein said 

plant oil comprises eicosapentenoic acid. 

26. A plant oil produced by a recombinant plant cell according to Claim 19, wherein said 
plant oil comprises docosahexenoic acid. 

10 

27. The plant oil according to Claim 25 or Claim 26, wherein said plant oil is encapsulated. 

28. A dietary supplement comprising a plant oil according to Claim 27. 
15 29. A recombinant E. coli cell which produces docosahexenoic acid. 

30. A plant oil comprising eicosapentenoic acid. 

3 1. A plant oil comprising docosahexenoic acid. 

20 

32. The recombinant microbial cell according to Claim 12, wherein said bacterial 
cell is a lactobacillus cell. 
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Fig. 2 
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hglD hglC OifX hglB hgIA hett 

KAS CLF AT ACP ? ? KR P-T 


Anabeana "PKS" Genes Involved in Heterocyst 

Glycolipid Synthesis** 
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Fig5#. Orf3 Encodes a Phosphopantetheine 

Transferase 


1. pUC19 s ^ "^c^-CL 

3. pAA-Neb (EPA +) — 

2. pPA-NEB (A Orf3) 

- 35^"^ 

4. Orf6 subclone 

5. Orf6 + Orf3 subclones 

6. Orf3 subclone 



Autoradiograph of [C14] B-Alanine labelled proteins from E, 
coli (strain SJ16) cells transformed with the above listed 
plasmids. Cells were grown in the presence of [C14] i5-alanine 
and the appropriate antibiotics. Proteins were extracted, 
separated by SDS-PAGE and transferred to a PVDF membrane 
prior to autoradiography. ACP and an unknown (but 
previously observed) 35 kD protein were labelled in all of the 
samples. The high molecular mass proteins detected in lanes 
2 and 5 are full-'^length (largest band) and truncaited 
products of the Shewanella Orf6 gene (confirmed by Western 
analysis - data not shown). E. coli strain SJ16 is conditionally 
blocked in 15-alanine synthesis. 
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Sequence Range: 1 to 37895 

30 40 60 eo 

• ••••••• 

GATCTCTTAC AAAGAAACTA TCTCAATGTG AATTTAACCT TJUlTTO CC n TAATTACCGC CTGATACACC ATCACCCAAT 

100 120 140 160 

• • • 9 ^ * , • ♦ • 

CACCCATAAA ACTOTAAAOT GCOTACTCAA AGCTCOCTOC CCCATTCTTC TCAAATACAA ACTCCCCAAC CCAACCAAAT 

IBO 2O0 220 240 

• • • • • • 

CCATAimUl TAACAGGTAA AAGTAGCAAT AAACCCCACC GCTGAGTTAC TAATACATAA GCGAATAATA GGATCACTAA 

260 260 300 320 

• « * • • * • ■• 

ACTACTCCCG AAATAGTCTA ATATTCCACA CTTTCTATGC TGATCTTGAC ATAAATAAAA AGGOTAAAAT TCACCAAAAG 
340 360 380 400 

AACGATAOCG CTTACTCATT ACTCACACCT CGGTAAAAAA GCAACTCGCC ATTAACTTGC CCAATCCTCA CTTOTTCTAT 

420 440 460 ' 4B0 

• ••■•*** 

CGTCtCAAAG TTATGCCGAC TAAATAACTC TATATGTOCA TTATGATTAC CAAAAACTCC GATACCATCA AGATGAAGTT 

500 S30 540 560 

GTTCATCACA CCAACTCAAA ACTGCCTCCA TAAGCTTACT CCCATAGCCC TTGCCTTGCT CCACATTTGC OATAGCAATA 

580 600 620 640 

AACTCTAAAA TCCCACATTC OCCACTTGGT AACCTCTCTA TAATCTGATT TTCTTTGTTA ATAACTCCCT CAGTTGAATA 

660 680 700 720 

• ••••••• 

CCAACCACTA CTTAACAACA TCTTTAAACG CCAATGCCAA AAACCCGCTT CACCTAACCG AACCTGCTGA GTCAC7ATGC 

740 760 780 BOO 

• ••••••• 

AGGCTACGCC TATCAATCTA TCCCCAACGA ACATACCAAT AAGTGCTTGC TCCTCTTCCC ACACCTCATT GAGTTCTTCT 

820 840 860 880 

• •••«••• 

CCAATAGCCC CGCCAACCTT TTGCTCATAC TOCCCTTGAT CACCACTAAA AAGTGTTTCG ATAAAAAAGG CATCATCATG 

900 920 940 960 

ATACCCCTTA TAGACAATAG AGCCTGCTAT GCGTAAATCT TCTCCCCTCA GATAAACTGC ACGACACTCT TCCATCCCTT 

980 1000 1020 1040 

• •»••••• 

CATCTTCCAT TGTTATTGTC CTTGACCTTG ATCACACAAC ACCAATGTAA CAAGACTCTA TACAACTGCA ATTAATAATC 

1060 lOBO 1100 1120 

AATTCGTGCA TTAAGCAG5T CAGCATTTCT TTGCTAAACA AGCTTTATTG GCTTTGACAA AACTTTGCCT AGACrTTAAC 

1140 1160 1180 1200 

• ••••**• 
GATAGAAATC ATAATGAAAC AGAAAAGCTA CAACCTAGAC GGCAATAATC AAACAACTGC TAAGAtCTAG ATAATCTAAT 

1220 1240 1260 1280 

• ••••••• 

AAACACCCAG rPTATCGACC ATACTTACAT AGACTCATAG CAACCAGAAT AGrTATQGAT ACAACGCCCC AACATCTATC 

1300 1320 1340 "60 

• ••••••• 

ACACCTGTTT TTACAGCTAG GATTAGCAAA TGATCAACCC CCAATTGAAC AGTTTATCAA TGACCATCAA TTACCGCACA 

1380 1400 1420 1440 

ATATATTGCT ACATCAACCA AGCTTTTCGA GCCCATCGCA AAAGCACTTC TTAATTGACT CATTTAATCA ACATCCCCAG 

1460 1480 1500 1S20 

• * • 

TGGACCGAAG TCATCCACCA CTTAGACACC TTATTAAGAA AAAACTAACC ATTACAACAC CAACTTTAAA TTTTGCCCTA 
1540 1560 1580 1«0<» 

AGCCATCTCC CCCCACCCCA CAACAGCCTT GTTCCTTATG ACCACTOCiyp TACATTCGTC TTTAGTCGTT TTACCATCAC r—* 
1620 1640 1660 ^^^^ ' 

CATGCCTACG TTGACTCCGA TAAAAAACCA CATAAACTTC TTTAICGGCC TGAATATAGC CTTCCTTAAA ATCAGCTCTT 

1700 1720 1740 

• * ' • * * * " __* 
CCCAtTAAAG TAACCACTTC CTCTTTACTC ATCCCTAGAG ATATCTTTCT CAAATTCTCA CGGTTTTTAT CTTCMTTTT 
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1780 1800 X820 1840 

♦ •«••• 
CTCCCAAGCA CCGTCATTAT CCCACTCAGA TTCCCCATCA CCAACATTGA CCACACAGCC CGTTAGCCCT AAGCTTGCAA 

1B60 I860 1900 1920 

• **••••• 
TCCCAAAACA TGCTAAACCT AATAATTTAT TTTTCATTTT AACTTCCyCT TATGACATTA TTTTTGCTTA GAAGAAAAGC 

1940 I960 1980 2000 

AACTTACyiTG CCAAAACACA AGCTGTTCTT TTAAATCACT TTATTTATTA TTAGCCTTTT AGGATATGCC TAGAGCAATA 

2020 2040 2060 2080 

ATAATTACCA ATGTTTAAGG AATTTGACTA ACTATGAGTC CGATTGAGCA AGTGCTAACA GCTGCTAAAA AAATCAAT6A 

2100 2120 2140 2160 

ACAAGGTAGA GAACCAACAT TAGCATTGAT TAAAACCAAA CTTGOTAATA GCATCCCAAT GCGCGACTTA ATCCAAGGTT 

2180 2200 2220 2240 

TGCAACAGTT TAAGTCTATG AGTGCAGAAC AAAGACAAGC AATACCTAGC AGCTTAGCAA CAGCAAAAGA AACTCAATAT 

2260 2280 2300 2320 

* ••••••• 

GGTCAATCAA GCTTATCTCA ATCTGAACAA GCTGATAGGA TCCTCCAGCT AGAAAACGCC CTCAATGAAT TAAGAAACGA 

2340 2360 2380 2400 

ATTTAATGGG CTAAAAAGTC AATTTGATAA CTTACAACAA AACCTGATGA ATAAAGAGCC TGACACCAAA TGCATGTAAT 

2420 2440 2460 2480 

* . • * * 

TGAACTACGA TTTGAATGTT TTGATAACAC CACGATTACT GCAGCAGAAA AAGCCATTAA TGGTTTGCTT GAAGCTTATC 

2500 2520 2540 3560 

GAGCCAATGG CCAGGTTCTA GGTCGTGAAT TTGCCGTTGC ATTTAACGAT GGTGAGTTTA AACCACGCAT GTTAACCCCA 

2580 2600 2620 2640 

GAAAAAAGCA GCTTATCTAA ACGCTTTAAT AGTCCTTGGG TAAATAGTGC ACTCGAAGAG CTAACCGAAG CCAAATTGCT 

2660 2680 2*700 2720 

#••••*♦• 

TOCGCCACGT GAAAAGTATA TTGGCCAAGA TATTAATTCT GAACCATCTA GCCAAGACAC ACCAAGTTGG CAGCTACTTT 

2740 2760 2780 2800 

• ••••*•* 
ACACAACTTA TGTGCACATG TGCTCACCAC TAAGAAATGG CGACACCTTG CAGCCTATTC CACTGTATCA AATTCCAGCA 

2820 2840 2860 2880 

ACTCCCAACG GCGATCATAA ACGAATGATC CGTTGGCAAA CAGAATGGCA AGCTTGTGAT GAATTGCAAA TGGCCGCAGC 

2900 2920 2940 2960 

TACTAAAGCT GAATTTGCCG CACTTGAAGA GCTAACCAGT CATCAGAGTG ATCTATTTAG GCGTGGTTGG GACTTACGTG 

2980 3000 3020 3040 

GCACAGTCGA ATACTTGACG AAAATTCCGA CCTATTACTA TTTATACCGT OTTGGCGGTG AAAGCTTAGC AGTAGAAAAG 

3060 3080 3100 3120 

CAGCGCTCTT GTCCTAAGTG TGGCAGTCAA GAATGGCTCC TCGATAAACC ATTATTGGAT ATGTTCCATT TTCGCTGTGA 

3X40 3160 3180 3200 

CACCTGCCCC ATCGTATCTA ATATCTCTTG GCMICCATTTA TAACTCTTCC GAGTCTTATC ACACTACAGT TTAGTCAGCA 

3220 3240 3260 3280 

• *•••*•* 

TAAAAATGGC GCTTATATTT CAATTAAAAG AAATATAAGC GCCATTTTCA TCGATACTAT ATATCAGCAG ACTATTTTCC 

3300 3320 3340 3360 

GCGTAAATTA GCCCACATTA ATTTCATTCT TTGCCAGATC CCTGGATGAT CTAGTTGTGG CATCGACTCT TCAATAGCTT 

3380 3400 * 3420 3440 

TAACCGCAOG TGTAACCCTT GGAGTCAATT CGTTTATAAA CTCGTTTAAA CTGTCACTTA ATTTAACCCT TTGTACTTCA 

3460 3480 3500 3520 

* . • • • • 

ccTGGAArrr CAATCCATAC GCTGCCATCA CTATTATTAA CCGTCAACAT TTTATCTTCA TCATCAAGAA taccaataaa 


rig. 4 
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3S40 3560 3580 3600 

CCAAGTCGGC TCTTGCTTAA GCTTTCTCT? CATCATTAAA TCACCAATCA TGTTTTGTTO TAACTATTCA AAATCAGTT? 

3620 3640 3660 3680 

• * • • • 

GAICCCACAC TTGGATTACC TCACCTTCGC CCCATTGTGA GTCAAAAAAT AGCGGTCCAG AAAAATGACT GCCAAAAAAT 

3700 3720 * 3740 3760 

* • • • • 
GGATTAATTP CTGCAGATAA TGTCATTTCA AGTGCTGTPT CAACATTAGC AAATTCACCA GGTTGTTGAC CTACAACCGA 

3780 3800 3820 3840 

rrCCCAAAAC ACTGCGCCAT CGGAGCCCGC TTCGGCGACA ACACACTCAG ACTTTTGTCC TTGCGCATAA TATCTTGGCT 
3860 3880 3900 3920 

6TTCACCAAG CTTATCCATO TAGGCTTGTT GATATTTAGA TAAAAAAACA TCTAAAGCAG GTAAAGAAGA CACTTAAGCC 
3940 3960 39BO 4000 

AGTTCCAAAA TCAGTTATAA TAGGGGTCTA TTTTGACATG GAAACCGTAT TGATGACACA ACATCATGAT CCCTACAGTA 

4020 4040 4060 4080 

♦ • • • • 

ACGCCCCCGA ACTTTCTGAA TTAACTTTAG GAAAQTCGAC CGGTTATCAA GAGCAGTATG ATGCATCTTT ACTACAAGCC 

4100 4120 4140 4160 

TGCCGCGTAA ATTAAACCGT GATGCTATCG GTCTAACCAA TGAGCTACCT TTTCATGGCT GTGATATTTG GACTGGCTAC 

4180 4200 4220 4240 

GAACTGTCTT GGCTAAATGC TAAAGGCAAG CCAATGATTG CTATTGCACA CTTTAACCTA AGTTTTGATA GTAAAAATCT 

4260 4280 4300 4320 

• ♦ • • * 

GATCGAGTCT AAGTCGTTTA AGCTGTATTT AAACAGCTAT AACCAAACAC GATTTGATAG CGTTCAAGCG GTTCAAGAAC 

4340 4360 4380 4400 

GTTTAACTGA AGACTTAAGC GCCTGTGCCC AAGGCACAGT TACGGTAAAA GTGATTGAAC CTAAGCAATT TAACCACCTG 

4420 . 4440 4460 4480 

AGAjCTGGTTG ATATGCCAGG TACCTGCATT GACGATTTAG ATATTGAAGT TGATGACTAT AGCTTTAACT CT6ACTATCT 

4500 4520 4540 4560 

CACCGACAGT GTTGATGACA AAGTCATGCT TGCTGAAACG CTAACGTCAA ACTTATTGAA ATCAAACTGC ctaatcactt 

4580 4600 4620 4640 

• «#•♦*•* 
CTCAGCCTGA CTGGGGTACA GTGATGATCC GTTATCAAGG gcctaagata gaccgtgaaa agctacttag atatc?gatt 

4660 4680 .4700 4720 

tcatttagac agcacaatga atttcatgag cagtctgttg agcgtatatt tgttgattta aagcactatt cccaatgtgc 

4740 4760 4780 4600 

• «♦•** •* 

CAAACTTACT GTCTATGCAC GTTATACCCG CCGTGGTGGT TTAGATATCA ACCCATATCG TACCGACTTT GAAAACCCTG 

4820 4840 4860 4880 

CAGAAAATCA GCGCCTACCG AGACAGTAAT TGATTGCAGT ACCTACAAAA AACAATGCCT ATAAGCCAAQ COTATGGGCA 

4900 4920 4940 4960 

« •«••••* 

rrrPTATATT atcaacttct catcaaacct cagccgccaa gccttttagt tttatcgcta aattaagccg ctctctcagc 

4980 5000 5020 5040 

CAAATATTTG CAGGATTTTO CTGTAATTTA TGGCTCCACA CCATGAAATA CTCTATCGGC TCTACCGCAA AAGGTAAGTC 

5060 5080 5100 5120 

AAATACCTGT AAGCCAAACA GCTTGGCATA TTCGTCACTG TGGGCTTTTG ACGCCATAGC TAACGCATCA CTTTTTGAGG 

5140 5160 ♦ 5180 5200 

* *•••••* 

CAACCOACAT CATACTTAAT ATTGATGATT GCTCGCTCTG CATTTGCCTT GCCGGTAACA CCTCTTTAGT CAGCAAGTCG 
5220 5240 5260 5280 

GCAACACTTA AATTGTAGCC GCGCATCTTA AAAAtAATAT GCTTTTCATT AAAGTATTGC TCTTGCGTCA ACCCACCTTC 
5300 5320 5340 5360 
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GATCCTTCGG TGAGCATTTC GTGCCACACA AACTAATTTA TCCTGCATTA CTTTTTGACT CTTAAATGCC GCACATTCTG 

5380 5400 5420 5440 

♦ ♦ * * * * 

GCAGCCAAAT ATCTAAGGCT AAATCCACCT TTTCTAGTTG TAGGTCCATC TGCAACTCTT CTTCAATGAG CGGCGGCTCA 

5460 5480 ♦ 5500 5520 

• * * ♦ • 

COAAATACAA TATTAATTGC ACTGCCCTGT AACACTTGCT CAATTTGATC TTGCAACAGT TCTATTGCCG ACTCGCTGGC 

5540 5560 5580 5600 

ATACACATAA AAAGTTCGCT CACTTGAAGT GGGGTCAAAT GCTTCAAAGC TAGTCGCAAC TTGCTCAATT GTTGACATAG 

5620 5640 5660 5680 

CGCCCGCGAO CTGTTGATAA AOCGTCATCG CACTTGCGGT AGCTTTAACT CCCCTACCCA CTCGAGTAAA CAACTCTTCT 

5700 5720 S740 5760 

CCAACyiATAC TTTTTAGCCT CCAAATCGCA TTACTAACCG ACCACTCAGT CAAATCCAGC TCTTCTGCCG CCCOGCTAAA 

5780 5B00 5820 5840 

AGATGAGGTG CGATACACCG CAGTAAAAAC GCGAAATAAA TTAAGATCAA AAGCTTTTTG CTGCGACATA AATCAGCTAT 

5B60 5880 5900 5920 

♦ « * • • • • • 
CTCCTTATCC TTATCCTTAT CCTTATAAAA AGTTAGCTCC AGAGCACTCT ACCTCAAAAA CAACTCAGCG TATTAAGCCA 

5940 5960 5980 6000 

ATATTTTGGG AACTCAATTA ATATTCATAA TAAAAGTATT CATAATATAA ATACCAAGTC ATAATTTAGC CCTAATTATT 

6020 6040 6060 6080 

AATCAATTCA AGTTACCTAT ACTGGCCTCA ATTAAGCAAA TGTCTCATCA GTCTCCCTGC AACTAAATGC AATATTGAGA 

6100 6120 6140 

• • « • • * • 

CATAAACCTT TGAACTCATT CAATCTTACG ^gjJPTAACTT ATf; AAA CAG ACT CTA ATG GCT ATC TCA ATC ATG 

MKQTLMAlSrK> 


6160 


6180 


6200 


TCG CTT TTT TCA TTC AAT GCG CTA GCA GCG CAA CAT GAA CAT GAC CAC ATC ACT GTT GAT TAC GAA 
SLFSFNALAAQHEHDHITVDYB> 


6220 


6240 


6260 


6280 


GOG AAA CCC GCA ACA GAA CAC ACC ATA GCT CAC AAC CAA GCT GTA GCT AAA ACA CTT AAC TTT GCC 
GKAATEHTIAHrJQAVAKTLNFA> 


6300 


6320 


6340 


GAC ACG CGT GCA TTT GAG CAA TCG TCT AAA AAT CTA GTC GCC AAG TTT GAT AAA GCA ACT GCC GAT 
DTRAFEQSSKNLVAKFDKATAD> 


6360 


6380 


6400 


ATA TTA CGT CCC GAA TTT GCT TTT ATT ACC GAT GAA ATC CCT GAC TCG GTT AAC CCG TCT CTC TAC 
ILRAEFAFlSDElPDSVNPSLy> 


6420 


6440 


6460 


6460 


CGT CAG GCT CAG CTT AAT ATG GTG CCT AAT GGT CTG TAT AAA GTG ACC GAT GGC ATT TAC CAG GTC 
R<lAQLNMVPNGLYKVSOGiyQV> 


6500 


6520 


6540 


CGC GGT ACC GAC TTA TCT AAC CTT ACA CTT ATC CGC ACT GAT AAC GGT TCG ATA GCA TAC GAT GTT 
RGTDLSNLTLIRSONGWIAYDV> 


6560 


6580 


6600 


TTG TTA ACC AAA GAA GCA GCA AAA GCC TCA CTA CAA TTT GCG TTA AAG AAT CTA CCT AAA GAT GGC 
Ll.TREAAKASLQrALKNLPKDC> 


6620 


6640 


6660 


6680 


GAT TTA CCC GTT GTT GCG ATG ATT TAC TCC CAT AGC CAT GCG GAC CAC TTT GGC GGA GCT CCC GGT 
DLPVVAH1YSHSHADBFGGARG> 


6700 


6720 


6740 


GTT CAA GAG ATC TTC CCT GAT GTC AAA GTC TAC GGC TCA GAT AAC ATC ACT AAA GAA ATT GTC CAT 
VQEMFPOVKVYOSDNITKEIV1» 


lA 


bo 
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6760 


67B0 6B00 
• • * • 

GAG AAC GTA 
E N V 

CTT GCC 
L A 

GGT 
G 

AAC GCC ATG AGC CGC CGC GCA OCT TAT CAA TAC CGC GCA ACA CTG GGC 
NAMSRRAAYQYGATLO 

6820 


* 

6840 6860 

* . • ♦ 

• 

AAA CAT GAC 
K H D 

CAC GOT 
H G 

ATT 
I 

GTT GAT GCT GCG CTA GGT AAA CGT CTA TCA AAA GGT GAA ATC ACT TAC 
VO AALGKGLSKGEITY> 

6880 


6900 6920 6940 
. » • • 

CTC CCC CCA 
V A P 

GAC TAC 
D Y 

ACC 
T 

TTA AAC ACT GAA GGC AAA TGG GAA ACG CTG ACG ATT GAT GGT CTA GAG 
LNSEGKWETLTIDGLE> 


6960 


6980 7000 
♦ * • • • 

* 

ATG GTG TTT 
M V F 

• 

ATG GAT 
M 0 

GCC 
A 

TOG GGC ACC GAA GCT GAG TCA GAA ATG ATC ACT TAT ATT CCC TCT AAA 
SGTEAESBMITVIPSK> 

7020 
* 



7040 7060 
• 

AAA GCG CTC 
R A L 

TGG ACG 
W T 

GCG 
A 

GAG CTT ACC TAT CAA CGT ATG CAC AAC ATT TAT ACG CTG CGC GGC GCT 
El.TYQGMHNIYTLRGA> 

7080 

* 


7100 7120 7140 

• 

AAA GTA CGT 
K V R 

GAT GCG 
D A 

CTC 

L 

AAG TGG TCA AAA GAT ATC AAC GAA ATG ATC AAT GCC TTT GGT CAA GAT 
KWSKDIMBMINAFGQD> 


7160 

7180 7200 
* • • * 

• 

GTC GAA GTG 
V E V 

CTG TTT 
L F 

GCC 
A 

TCG CAC TCT GCG CCA GTG TGG GGT AAC CAA GCG ATC AAC GAT TTC TTA 
SHSAPVWGNQAINDFL> 

7220 

• • 


7240 7260 
* • • ♦ • 


CGC CTA CAG CGT GAT AAC TAC GGC CTA GTG CAC AAT CAA ACC TTG AGA CTT GCC AAC OAT GGT GTC 
RLQRDNYGLVHNQTLRLANDGV> 

7280 7300 7320 7340 

GOT ATA CAA GAT ATT GGC GAT GCG ATT CAA GAC ACG ATT CCA GAG TCT ATC TAC AAG ACG TGG CAT 
0IQD IGDAIQDT1PBSIYKTWH> 

7360 7380 7400 

ACC AAT GGT TAC CAC GGC ACT TAT AGC CAT AAC GCT AAA GCC GTT TAT AAC AAG TAT CTA GGC TAC 
TNGYHGTYSHNAKAVYNKYLOY> 

7420 7440 7460 

TTC GAT ATG AAC CCA GCC AAC CTT AAT CCG CTG CCA ACC AAG CAA GAA TCT GCC AAG TTT GTC GAA 
FOMNPANLNPI.PTKQESAKFVE> 

7480 7500 7520 

* 

TAC ATG GGC GGC GCA GAT GCC GCA ATT AAG CGC GCT AAA GAT GAT TAC GCT CAA GGT GAA TAC CGC 
YMGGADAAIKRAKDDYAOGEyR> 

7540 7S60 7580 7600 

* * * * * 

TTT GTT CCA ACG GCA TTA AAT AAG GTG GTG ATG GCC GAG CCA GAA AAT GAC TCC GCT CGT CAA TTG 
FVATALNKVVMAEPENDSAROL> 

7620 7640 7660 

4 * * • * 

CTA GCC GAT ACC TAT GAG CAA CTT GGT TAT CAA GCA GAA GGG GCT GGC TGG AGA AAC ATT TAC TTA 
LAOTYEQLGYQAEGAGWRNIYl*> 

7680 7700 7720 

* ♦ * * * • 

ACT GGC CCA CAA GAG CTA CGA GTA GGT ATT CAA GCT GGC GCG CCT AAA ACC GCA TCG CCA GAT GTC 
TGAQBLRVQIQAGAPKTASADV> 

7740 7760 7780 7800 

• ♦ • <t • * 

ATC ACT GAA ATG GAC ATG CCG ACT CTA TTT GAC TTC CTC GCG GTG AAG ATT GAT ACT CAA CAG GCG 
ISEMDMPTLFOFLAVKIDSQQA> 

7820 7840 4 ^ "'^^J J 

GCT AAG CAC GGC TTA GTT AAG ATG AAT GTT ATC ACC CCT GAT ACT AAA GAT ATT CTC TAT ATT GAG I jfl . i 

AKHGLVKKNVITPDTKDILYIE> Q 

7.80 , .»oo . . ^Uq 

CTA AOC AAC GCT AAC TTA AOC AAC CCA OTO OTC GAC AAA OAC CAA CCA CCT OAC OCA AAC CTT ATS / 
LSMONLSHAVVOKEQAAOAMLW 


PCTAJS98/11639 


7940 7960 •'960 8000 

* • , * • • • 

GTT AAT AAA GCT GAC GW AAC CGC ATC TTA CTT GGC CAA GTA ACC CTA AAA GCG TTA TTA GCC AGC 
VNKAOVNRll-LGQVTLKALLAS> 

8020 8040 

GGC GAT GCC AAG CTC ACT GGT GAT AAA ACG CCA TTT ACT AAA ATA GCC CAT ACC ATG GTC GAG TTT 
GDAKLTCDKTAFSK1ADSMVEF> 

8080 8100 ^ B120 ^ B140 

ACA CCT GAC TTC GAA ATC GTA CCA ACG CCT GTT AAA TGACGCA TTAATCTCAA CAAGTGCAAG CTAGACATAA 
TPDrElVPTPVK> 

8160 8180 B200 

AAATGGGGCC ATTAGACGCC CCATTTTTTA TGCAATTTTG AACTA GCT AGT CTT AGC TGA AGC TCG AAC AAC 

<STKASARVV 

8220 B240 8260 

ACC TTT AAA ATT CAC TTC TTC TGC TGC AAT ACT TAT TTG CTG ACA CTG ACC AAT ACT CAG TGC AAA 
<AKPMVEBAAIS1QQCQG1SLAF 

82B0 B300 8320 8340 

ACG ATA ACT ATC ATC AAG ATG GCC CAG TAA ACA ATG CCA ATT ATC AGC AGC GTT CAT TTG CTG TTC 
<RYSDDLHGLLCHWNDAANMQQE 

8360 8380 8400 

TTT AGC CTC AAT CAA ACC TAA ACC AGA CTT TTG TGG CTC AGC GTT AGG CTT ATT AGA ACT CGA CTC 
<KAEIL GLCSKQPEAMPKNSSSE 

8420 B440 8460 

TAG TAA AGC AAG ACC AAT ATC TTG TTT TAA CAA AAC CTC TCG CTG ATT AAG TTG ATG CTC AAC CTT 
<LLALG1DQKLLVQRQNLQHEVK 

8480 8500 8520 8540 

♦ * * 

GTG ATC CGC AAT AGC ATC GGA AAT ATC AAC ACA ATG OCT CAA GCT TTT AGC TGC ATT AAC TCC AAG 
<HDAIADSI0VCHSLSKPANVGL 

8560 B580 8600 

AAA AGT TTC GCT CAG TGC AGA GAA GTC AAA CGC AAA AGA TTT TAG CGA TAA TCC CAG CCC AAG TCC 
<FTESLASFDFAFSKLSLALGLG 

8620 8640 8660 

TTT CGC TTT AAT GTA AGA CTC CTT GAG CGC CCA CAA ATC AAA AAA GCG GTC TCG CTG CAA GGC CTC 
<KAKIYSEKLAWLDFFRDROLAE 

8680 B700 ^ 8720 ^ 8740 

TGG TAA CGC TAA CAA GGC TCG CTT TTC TGA TTC AGA GAA ATA ATG ACT AAG AAT AGA GTG GAT ATT 
<PLALLARKESESFYHSLISH1N 

8760 8780 BBOO 

» . • • • * 

GGT CCT GTT ACG GCA ACG CTC AAT GTC GAC GCC AAA CTC AAT ACT AGC AGA GTC AGT TTC CTC CTT 
<TSNRCRE1DVGFEISASDTEEK 

8820 8840 8860 

* * • ♦ * ♦ 

GCT TGC CTG ACT GGC GCC TTT ATT ATC AGC AGT GCA AAT GCC TAC TAA TAG CCA ATC TCC ACT ATG 
<SAQSAGKIIDATCIGVLLWDCSH 

8880 8900 8920 

♦ • • • * 

ACT CAC ATT AAA GTG GAC CCC GGT TTG AGC AAA TTG CGC ATC ACT CAA TCT ACG CTT ACC TTT GTC 
<SVNFHVGT01^F0ADSLRPKCKD 

8940 8960 8980 9000 

GCC ATA TTC AAA GCG CCA TTC ATT GGG GCG TAT TTC ACT ATG TTC TGA CAA TAA AGC GCG CAA ATA 
<GyEPRWENPRIESHOSI'l''^*^^^ 

9020 9040 • 9060 9060 

GCC TCT TAC CAT TAAA CCTTGAGTTT TAGCTTCTTG TTTAATGTAG CGATTAACCT TAATTAACTC ATCTTCAGCC 
<G R V M 

9100 9120 ^ 9140 ^ 9160 

ACCCATGACT TAACCAACTC TGTAGTCTCG TTATCCCACT CTTGTATTGT TAACGGACAG AAGTATAAGG AAATCAATCC 
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9180 9200 9220 9240 

♦ ♦*•*♦•• 

AGAAGTTAGC AATTTTTCAG GACACTCTTT AAAGCAACAA ACATAACCCC TATTTTTACC AATTTAACAT CAAAACTAAA 

9260 9280 9300 9320 

* • * * • • * * 

GCCAAAACTA ATTGAGAATA GTCTCAAACT AGCTTTAAAG GAAAAAMTA TAAAAAGAAC ATTATACTTG TATAAATTAT 

9340 9360 9360 9400 

* •••• 

TTTACACACC AAACCCATGA TCTTCACAAA ATTAGCTCCC TCTCCCTAAA ACAAGATTGA ATAAAAAAAT AAACCTTAAC 
9420 9440 9460 94B0 

TTTCATATAG ATAAAACAAA CCAATGGGAT AAAOTATATT GAATTCATTT TTAAGGAAAA ATTCAAATTG AATTCAAGCT 
9500 9520 9540 9560 

CTTCAGTAAA AGCATATTTT GCCGTTAGTG TGAAAAAAAA CAAATTTAAA AACCAACATA GAACAAATAA GCAGACAATA 

9580 9600 9620 9640 

• • « * * *'* * 

AAACCAAGGC GCAACACAAA CAACGCGCTT ACAATTTTCA CAAAAAAGCA ACAAGAGTAA CGTTTAGTAT TTGGATATGG 

9660 9680 9*700 

• • * • * • * 

TTATTGTAAT TGAGAATTTT ATAACAATTA TATTAACOGA ATG A GT ATG TTT TTA AAT TCA AAA CTT TCG CGC 

T5 SMFLNSKL5R> 

9720 9740 9760 

« • • * • * 

TCA GTC AAA CTT GCC ATA TCC GCA CGC TTA ACA GCC TCG CTA GCT ATG OCT GTT TTT GCA GAA GAA 
SVKLAI5A0LTA5LAMPVPAEE> 

9760 9800 9820 9840 

ACT GCT GCT GAA GAA CAA ATA GAA AGA GTC GCA GTG ACC GGA TCG CGA ATC GCT AAA GCA GAG CTA 
TAAEEQIERVAVTGSR ZARAEL> 

9860 9880 9900 

♦ * • * * • ♦ 

ACT CAA CCA GCT CCA GTC GTC AGC CTT TCA GCC GAA GAA CTG ACA AAA TTT GGT AAT CAA GAT TTA 
TQPAPVVSLSAEELTKPGMODL> 

9920 9940 9960 

• « *■ « * * 

GGT AGC GTA CTA GCA GAA TTA CCT GCT ATT GGT GCA ACC AAC ACT ATT ATT GGT AAT AAC AAT AGC 
6SVLAELPAIGATNTZ ZGNNI1S> 

9980 10000 10020 10040 

* * * « • * ♦ 

AAC TCA AGC GCA GGT GTT AGC TCA GCA GAC TTG CCT CGT CTA GGT GCT AAC AGA ACC TTA GTA TTA 
NSSAGV5SADLRRLGANRTLVL> 

10060 10080 10100 

• ♦ • » • * 

GTC AAC GGT AAG CGC TAC GTT GCC GGC CAA CCG GGC TCA GCT GAG GTA GAT TTG TCA ACT ATA CCA 
VNGKRyVAGQPGSAEVDLSTI P> 

10120 10140 10160 

• • • • • • • 

ACT AGC ATG ATC TCG CGA GTT GAG ATT GTA ACC CGC OGT GCT TCA GCA ATT TAT GGT TCG GAC GCT 
T SN I S RVEIVTGGAS A I YGSDA> 

10180 10200 10220 10240 

• « * * • * M 

GTA TCA GGT CTT ATC AAC GTT ATC CTT AAA GAA GAC TTT GAA GGC TTT GAG TTT AAC GCA CGT ACT 
VSGV INVILKEOFEGrEFNART> 

10260 10260 10300 

* « • • * • 

AGC GGT TCT ACT GAA AGT GTA GGC ACT CAA GAG CAC TCT TTT GAC ATT TTG GGT GGT GCA AAC GTT 
SGSTESVGTQEKSFOZLGGANV> 

10320 10340 10360 

• • • « • * * 

GCA GAT GGA CGT GGT AAT OTA ACC TTC TAC GCA GGT TAT GAA CGT ACA AAA GAA GTC ATG GCT ACC 
ADGBGNVTryAGYERTKEVMAT> 

10380 10400 « 10420 

* * • • « « 

GAC ATT CGC CAA TTC GAT GCT TGG GGA ACA ATT AAA AAC GAA GCC GAT OGT GGT GAA CAT GAT GGT 
DIRQFDAWGTIKNEADGOEDDO 

10440 10460 10480 10500 

« « • • • * . * 

ATT CCA GAC AGA CTA CGT CTA CCA CGA GTT TAT TCT GAA ATG ATT AAT GCT ACC GGT GTT ATC AAT 
.IPDRl*RVPRVYSEMINATGVIN> 


7/30 


wo 98/55625 


12 / 106 


PCTAJS98/11639 


10520 10540 10560 

« • • • * 

GCA TTT GGT GGT GGA ATT GGT CGC TCA ACC TTT GAC ACT AAC GGC AAT CCT ATT GCA CAA CAA GAA 
AFGGGIGRSTFDSNGHPIAQOE> 


105B0 


10600 10620 


CGT GAT GGG ACT AAC AGC TTT GCA TTT GGT TCA TTC CCT AAT GGC TOT GAC ACA TCT TTC AAC ACT 
RDGTNSFAFGSPPNGCOTCFMT> 


10640 


10660 10680 10700 


GAA GCA TAC CAA AAC TAT ATT CCA GGG GTA GAA AGA ATA AAC GTT GGC TCA TCA TTC AAC TTT GAT 
BAYENYIPGVERINVGSSFNFD> 


10720 


10740 10760 


10840 


TTT ACC GAT AAC ATT CAA TTT TAC ACT GAC TTC AGA TAT GTA AAG TCA GAT ATT CAG CAA CAA TTP 
PTDKIQFYTDFRYVKSDIQQQr> 

10780 10800 10820 

CAG CCT TCA TTC CGT TTT GGT AAC ATT AAT ATC AAT GTT CAA GAT AAC GCC TTT TTG AAT GAC GAC 
QPSFRFGNININVBDNAFLMDD> 

10860 10880 10900 

TTG CGT CAG CAA ATG CTC GAT GCG GGT CAA ACC AAT GCT AGT TTT GCC AAG TTT TTT GAT GAA TTA 
I,RQQMLDAGQTNASFAKFFDEL> 

10920 10940 10960 

» • • ' • 

GCA AAT CGC TCA GCA GAA AAT AAA CGC GAA CTT TTC CGT TAC GTA GGT GGC TTT AAA GGT GGC TTT 
GNRSAENKRELFRYVGGFKGGF> 

10980 11000 11020 

♦ * ♦ • . , * 

GAT ATT AGC GAA ACC ATA TTT GAT TAC GAC CTT TAC TAT GTT TAT GGC GAG ACT AAT AAC CGT CGT 
DI SETl FDyD LYYVYCETNNRR> 

11040 11060 11080 

• * • . • ♦ 

AAA ACC CTT AAT GAC CTA ATT CCT GAT AAC TTT GTC GCA GCT GTC GAC TCT GTT ATT GAT CCT GAT 
KTLNDL1PDNFVAAVDSVIDPD> 

11100 11120 11140 11160 

• • * • 

ACT GGC TTA GCA GCG TGT CGC TCA CAA GTA GCA AGC GCT CAA GGC GAT GAC TAT ACA GAT CCC GCG 
TGLAACRSOVASAQGDDYTDPA> 

11180 11200 11220 

TCT GTA AAT GGT AGC GAC TGT GTT GCT TAT AAC CCA TTT GGC ATG GGT CAA GCT TCA GCA GAA GCC 
SVNGSDCVAYMPFGMGQASAEA> 

11240 11260 11280 

* • • * • • 

CGC GAC TGG GTT TCT GCT GAT GTG ACT CGT GAA GAC AAA ATA ACT CAA CAA GTG ATT GGT GCT ACT 
RDWVSADVTREDKITQQVIGGT> 


11300 


11320 11340 11360 


CTC GGT ACC GAT TCT GAA GAA CTA TTT GAG CTT CAA GGT GGT GCA ATC GCT ATG GTT GTT GGT TTT 
LCTDS EELFELQGGAIAMVVGF> 

11380 11400 11*20 

« « * , * • 

GAA TAC CGT GAA GAA ACG TCT GGT TCA ACA ACC GAT GAA TTT ACT AAA GCA GGT TTC TTC ACA AGC 
EYREETSGSTTDEFTKAGFLTS> 

11440 11460 11*B0 

* • ♦ • * • * 

GCT GCA ACG CCA GAT TCT TAT GGC GAA TAC GAC GTG ACT GAG TAT TTT GTT GAG GTG AAC ATC CCA 
AATPDSYGEY0VTEYFVEVMIP> 

11500 11520 11S40 11560 

* • • • • * * 

GTA CTA AAA GAA TTA CCT TTT GCA CAT GAG TTG AGC TTT GAC GGT GCA TAC CGT AAT GCT GAT TAC 
VLKELPFAHELSFDGAYRNADY> 

11580 11600 * . ^ 11620 

TCA CAT GCC GGT AAG ACT GAA GCA TGG AAA GCT GGT ATG TTC TAC TCA CCA TTA GAG CAA CTT GCA L jA U 
SHAGKTEAWKAGMFYSPLEOLA> 'j'' 

11640 11660 11680 

* • * * • • X 

TTA CGT GGT ACG GTA GCT GAA GCA GTA CQA CCA CCA AAC ATT GCA GAA GCC TTT ACT CCA CGC TCT 
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LRGTVCEAVRAPNXAEAFSPRS* 

U700 ^ U720 ^ 

CCT CGT TTT OCC CCC GTT TCA GAT CCA TCT CAT CCA OAT AAC ATT AAT CAC CAT CCG GAT CGC CTG 
POFORVSOPCDADIIINDOPDRV> 

11760 ^ meo ^ UBoo ^ neso 

icA AAC TCT C^A CCA TTG GC^ ATC CCT CCA GGA TTC CAA OCT AAT CAT AJ^ ACT GTA CAT ACC 
SMCAALCIPPGF QAMDNVSVDT> 

U840 ^ 1X860 ^ 118B0 

TTA TCT COT CGT AAC CCA CAT CTA AAA CCT GAA ACA TCA ACA TTC TOT Att CCT 
LSCCHPDLKPETSTSFTOCLVW> 

U900 U520 1"«0 

• • • . • 

ACA CCA ACG TTT OCT CAC AAT CTA TCA TTC ACT GTC GAT TAT TAT GAT ATT CAA ATT GAG CAT OCT 
TPTFADNLSFTV0yYOrQIEDA> 

11960 11980 ^ "000 ^ 12020 

ATT TTC TCA CTA CCC ACC CAC ACT CTG CCT GAT AAC TGT GTT CAC TCA ACT CCC CGA OCT GAC ACC 
XLSVATQTVADMCVDSTGGPDT> 

12040 12060 ^ 12080 

GAC TTC TCT ACT CAA CTT CAT COT AAT CCA ACQ ACC TAT CAT ATT CAA CTT GTT CGC TCT GOT TAT 
nFCSQVDRNPTTYDXELVRSGY> 

12100 12120 12140 

• • * . • • 

CTA AAT CCC CCG CCA TTC AAT ACC AAA CCT ATT GAA TTT CAA GCT CCA TAC TCA TTA CAT CTA CAC 
LNAAALMTKGIEFQAAYSLDLE> 

12160 12180 ^ 122O0 ^ 12220 

TCT TTC AAC CCC CCT GCT GAA CTA CCC TTC AAC CTA TTC CGC AAC CAA TTA CTT GAA CTA CAA COT 
SFNAPGELRFNLLGNQLLELER> 

12240 12260 122B0 

CTT CAA TTC CAA AAT CCT CCT GAT CAC ATT AAT GAT CAA AAA CCC CAA CTA CCT GAT CCA CAC CTG 

Tefqmrpdeindekcbvcdpel> 

12300 12320 12340 

CAC TTC CGC CTA CGC ATC GAT TAC CGT CTA GAT CAT CTA ACT GTT ACC TCC AAC ACC CCT TAT ATT 
QPRLGIDYRLDDLSVSWKTRYI> 

12360 12380 12400 

GAT ACC GTA CTA ACT TAT OAT GTC TCT CAA AAT OCT CCC TCT CCT GAA GAT TTA TAT CCA CCC CAC 

dsvvtydvsencospeolypgh> 

12420 ^ 12440 ^ 12460 ^ 124B0 

ItA CCC TCA ATC ACA ACT CAT GAC TTC ACC GCT ACA TAC TAC ATC AAT GAG AAC TTC ATG ATT AAC 

igsmtthdlsatyyineiifhin> 

12500 12520 12540 

CGT CGT CTA CGT AAC CTA TTT GAC CCA CTT CCA OCT CGA TAC ACT AAC CAT CCC CTA TAT GAT CTA 
G GVRNLFDALPPCYTIIDALYDL> 

12560 12580 12«00 12620 

• • • • • • 

CTT CCT CGC COT CCA TTC CTA CGT ATT AAG GTA ATC ATG TAATTAATTA TTACCCCTCT AACTAATAAA 
VCRRAFLGIKVMM> 

12640 12660 12680 12700 

AATOCAATCT CTTCCTAGAG ATTCCATTTT TTTATCAAAT CCAATCTTAA ACTCGTTCTC CGAOCATCTT ACCCCrTAAA 

12720 12740 12760 127B0 

• • ♦ • • 

AACCCCGCCC CTCAATCTAA CCCCAAAGTT AATTCCTTAC ACCCACTTAC ACAAACCAAC AATTTCATTA ACACCAGACA 

12800 12820 ♦ 12840 12B6B 


CACCTCACCC TTTTTATTW ACCCTTGATT TTACTACATA AAATTGCCTT TTACCGCACA ACTCTTCTCC CAAGCTGCTC J 
12B80 12900 12920 12940 f . 7 


GTATCTCTAA TTATTCAGTC CCAGCTGATT CTATTGACCC ATAACCTCAC CTACTCTGCT CTGCCATTAG CTAAACAATA 
12960 12880 13000 "020 


^'3 
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TTGACftA AAT GCCOATAAAA TGTGGCTPAG CGCTAACTTC ACCGTAAGTT TTATCGGCAT TAACTCCCAA CAGATTATTA 

13040 13060 13080 

* • • » . • • 

ACGGAAACCC GCTAAACTG ATC GCA AAA ATA AAT AGT GAA CAC TTG GAT GAA GCT ACT ATT ACT TCG AAT 

"wF AKINSEHLDBATITSN> Pw. 

¥ O '-'^ ^ 

13100 13120 13140 

• • • ♦ . • 

AAG TGT ACG CAA ACA GAG ACT CAG GCT CCO CAT AGA AAT CCC ACT ACA ACA CCT GAG ATC CGC CGA 
KCTQTETEARHRNATTTPEMRR> 

13160 13160 13200 13220 

* • • • * • • 

TTC ATA CAA GAG TCG GAT CTC AGT GTT AGC CAA CTC TCT AAA ATA TTA AAT ATC AGT GAA GCT ACC 
PI QESDLSVS0LSKILNISEAT> 

13240 13260 13280 

* * * • • ♦ 

GTA CGT AAG TGG CGC AAG CGT GAC TCT GTC GAA AAC TGT CCT AAT ACC CCC CAC CAT CTC AAT ACC 
VRKWRKRDSVENCPNTPHKLNT> 

L3300 13320 13340 

* * . * ♦ • • 

ACG CTA ACC CCT TTG CAA GAA TAT GTG GTT GTG GGC CTG CGT TAT CAA TTG AAA ATG CCA TTA GAC 
TLTPLQBYVVVOLRYOLK. MPLD> 

13360 13380 13400 13420 

► ♦ , . * . . 

AGA TTG CTC AAA GCA ACC CAA GAG TTT ATC AAT CCA AAC GTG TCG CGC TCA GOT TTA GCA AGA TGT 
RLLKATQEFINPNVSRSG*«ARC> 

13440 13460 13480 

• • . ♦ * • 

TTG AAG CGT TAT GGC GTT TCA CGG GTG AGT GAT ATC CAA AGC CCA CAC GTA CCA ATG CGC TAC TTT 
LKRYGVSRVSDIQSPHVPMRYF> 

13500 13520 13540 

* • * ♦ ♦ ♦ • 

AAT CAA ATT CCA GTC ACT CAA GGC AGC GAT GTG CAA. ACC TAC ACC CTG CAC TAT GAA ACG CTG GCA 
NQl PVTQGSDVQTYTt.HYETLA> 

13560 13580 13600 

* • • » ♦ * 

AAA ACC TTA GCC TTA CCT AGT ACC GAT GGT GAC AAT GTG GTG CAA GTG GTG TCT CTC ACC ATT CCA 
KTLALPSTDGDNVVQVVSLTI P> 

13620 13640 13660 13680 

• « < • . ♦ ♦ • 

CCA AAG TTA ACC GAA GAA GCA CCC AGT TCA ATT TTG CTC GGC ATT GAT CCT CAT AGC GAC TGG ATC 
PKLTEEAPSS1LX.GIDPHSDWI> 

13700 13720 13740 

* • • * • * • 

TAT CTC GAC ATA TAC CAA GAT GGC AAT ACA CAA GCC ACG AAT AGA TAT ATC GCT TAT GTG CTA AAA 
VLDIYQDGNTQATNRYMAYVLK> 

13760 13780 13800 

• • • ♦ * • 

CAC GGG CCA TTC CAT TTA CCA AAG TTA CTC GTG CGT AAC TAT CAC ACC TTT TTA CAG CGC TTT CCT 
KGPFHLRKLLVRMYHTFLQRFP> 

13820 13840 13860 13880 

* * * • • * • 

GGA GCG ACG CAA AAT CGC CGC CCC TCT AAA GAT ATG CCT GAA ACA ATC AAC AAG ACG CCT GAA ACA 
GATQMRRPSKDMPETINKTPET> 

13900 13920 13940 

* * • * * • 

CAG OCA CCC AGT GGA GAC TCA TA ATG AGC CAG ACC TCT AAA CCT ACA AAC TCA GCA ACT GAG CAA 

Q A P S G D S> *~ ^-s /- -V 

MSQTSKPTNSATEQ> ^ ^ ^ 

13960 13980 14000 

* « • 4 • • ♦ 

GCA CAA GAC TCA CAA OCT GAC TCT CGT TTA AAT AAA CGA CTA AAA CAT ATG CCA ATT GCT ATT GTT 
AQDSQADSRLNKRLKDMP1AIV> 


14020 14040 14060 

GGC ATG GCG AGT ATT TTT GCA AAC TCT CCC TAT TTG AAT AAG TTT TGG CAC TTA ATC AGC GAA AAA 
GMASIFANSRYLNKFWDLISEK> 

14080 14100 14120 14140 

* * , • • 

ATT GAT GCG ATT ACT GAA TTA CCA TCA ACT CAC TGG CAG CCT GAA GAA TAT TAC GAC GCA GAT AAA 
TDAITELPSTHWQPEBYYOADK> 
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X4160 ^ **^®; 

* w>* I&A CCT GGT GGC TTT TTG CCA GAT GTA CAC TTC AAC CCA ATG 

ACC CCA GCA GAC AAA ACC TAG ^ AAA OCT GGT GGC TTT i^^- ^ ^ ^ ^ ^ p 

X4220 14240 ^ ^«60 

GAaTTTCG;CTGCCGCCA;ACATTmaI^«^^ 

14280 1"00 ^ 

„A GOT CGC OCT CA. AA* AOC CAC AOC CTA ACA G« CGT C« C^^ TA= CC^ OT^ 

LOVGGO0K1SHSI-* 

14420 . 
«G AJ^ GTA OCC AAT Aoi OOC AT, ACT CAC ACC C^C ACC GAA AT^ 

14480 ' V* " ' ^ ,«20 . US40 

CAC CL. TAT OTA CA; TOG GA^ GA^ AAC T.G TTC CCA COT TCA CTT a« ^ 

14560 1*5^° 
ATC GCC AA^ CGC TTC GAT ;tt GCC GGC A^ AAC TGT OTT GAT GCT CCC TCT C« ^ 

14620 i^e^o . 

GOT ATG CGT ItC GCG CTA KCK GAG CTA AC; GAA GGT CGC TCT GAK ATG ATG ATC ACC G« G^ 
AAMRMALTELTEGK* 

146B0 ""0 . 

CTG TGT ;CT GAT AAC T^A CCC TCT A,; TAT ATG ACC TTT ^ ACG CCC GCC TTT ACC ACT AAC 

. 

ACC ATT CCA TTT GA; ATC G*C «A AAA GGC ATG AT^ 

14B20 1"" . 

GCO C;* AAC CGT CTJ GAA OAT GCA «0 CCC OAT 0^ G^ CGC ATT TAC TCT GTA ATT AAA GGT GTC 

14880 "900 . 

GOT GCA TcI TCT GAC GOT i^O TTT AAA TCA ATC TAT GCC CCT CGC CCA TCA OGC CAA OCT AAA CCA 

14/40 ' ' ' ° " ^60 " , ""0 . ^SCOO 

CT; AAC CGT GCC ;AT GAT GAC GCA GGT T« G« COG CAT ACC TTA 0^ CTA ATT GA^ 0« CJ^^ 
LHRAVDDAGFAPHTUl.i- 

15020 15040 . "060 

ACA GGT IcT GCA OCA G^T GAC OCG GcI GAG TTT GCC OK CTt TGC TCA OTA TTT OCT GAA GGC AAC 
TGTAAGDAAEFAVj** 

15080 "100 . "120 

;at ACC AAG cL. CAC ATT GC; CTA GGT ;« A« TCA CAA ATT GOT CAT ACT AJ^ 
DTKQHlALGSVKSOi«»" 

15140 ""0 


GOT A^A GCA GGT T,; ATT AAA OCT ^CT CTT OCT TTO CAT CAC AAG OTA CTO CCG CCO ACC ATT AAC 
GTAGLIKAALALBBHV"' 

15220 . 

^ .OT CA; CCA .OC CCT CTT GAT ATC GAA AAC TCA CCO TTT TAT CTA AAC ACT GAG ACT COT 

VSQPSPKi-OlENSrr 

it;iAA 15320 
15280 15300 

CcI ,00 TTA CCA ^CT GTT GAT Gg\ ACO CCG CG; CGC 0« GOT A« AOC T« 
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xdJ40 1S360 15380 

* * • • « 

ACT AAC TTC CAT TTT OTA CTA GAA GAG TAC AAC CAA GAA CAC AGC CGT ACT GAT AGC CAA AAA GCT 
TMPHFVLEEYNQB .HSRTDSEKA> 

15400 15420 15440 15460 

• * * * * 

AAO TAT CGT CAA CGC CAA GTG GCG CAA AGC TTC CTT GTT AGC GCA AGC GAT AAA GCA TCG CTA ATT 
KYRQRQVAQSFLVSA SDKASLI> 

15480 15500 15520 

* • • • • • • 

AAC GAG TTA AAC CTA CTA GCA GCA TCT GCA AGC CAA GCT GAG TTT ATC CTC AAA GAT GCA GCA GCA 
NELNVLAASASQABPZLKDAAA> 

15540 15560 15580 

* • • • • * 

AAC TAT GGC GTA CGT GAG CTT GAT AAA AAT GCA CCA CGG ATC GOT TTA GTT GCA AAC ACA GCT GAA 
NYGVRELDKNAPRIGLVAN TAE> 

15600 15620 15640 15660 

* • * * * * * 

GAG TTA GCA GGC CTA ATT AAG CAA GCA CTT GCC AAA CTA GCA GCT AGC GAT GAT AAC GCA TGG CAG 
ELAGLZKQALAKLAASDDNAWQ> 

15680 15700 15720 

• * • « * * 

CTA CCT GGT GGC ACT AGC TAC CGC GCC GCT GCA GTA GAA GGT AAA GTT GCC GCA CTG TTT GCT GGC 
LPGGTSYRAAAVEGKVAALFAG> 

15740 15760 15780 

CAA GGT TCA CAA TAT CTC AAT ATG GGC CGT GAC CTT ACT TGT TAT TAC CCA GAG ATG CGT CAG CAA 
OGSQYLIiMGRDLTCYYPEMRQQ> 

15800 15B20 15840 15860 

* • * * • * • 

TTT GTA ACT GCA GAT AAA GTA TTT GCC GCA AAT GAT AAA ACG CCG TTA TCG CAA ACT CTG TAT CCA 
FV. TADKVFAANDKTPLSQTLYP> 

15880 15900 15920 

* • * * * * 

AAG CCT GTA TTT AAT AAA GAT GAA TTA AAG GCT CAA GAA GCC ATT TTG ACC AAT ACC GCC AAT GCC 
KPVFNKDELKAQEAI LTNTANA> 

IS940 15960 15980 

* • • * ♦ • * * 

CAA AGC GCA ATT GGT GCG ATT TCA ATG GGT CAA TAC GAT TTG TTT ACT GCG GCT GGC TTT AAT GCC 
QSAIGAI SMGQYDL?TAAGFNA> 

16000 16020 16040 

«- * * * * *' 

GAC ATG GTT GCA GGC CAT AGC TTT GGT GAG CTA AGT GCA CTG TGT GCT GCA GGT GTT ATT TCA GCT 
OMVAGHSFGELSALCAAGVISA> 

16060 16080 16100 16120 

« « « • * • « 

GAT GAC TAC TAC AAG CTG GCT TTT GCT CGT GGT GAG GCT ATG GCA ACA AAA GCA CCG GCT AAA GAC 
DDYYKLAFARGEAMATKAPAKD> 

16140 16160 16180 

* t • * * • * 

GGC GTT GAA GCA GAT GCA GGA GCA ATG TTT GCA ATC ATA ACC AAG AGT GCT GCA GAC CTT GAA ACC 
GVEADAGAKFAI 1TKSAADLET> 

16200 16220 16240 

* ♦ • « • ♦ 

GTT GAA GCC ACC ATC GCT AAA TTT GAT GGG GTG AAA GTC GCT AAC TAT AAC GCG CCA ACG CAA TCA 
VEATIAKFDGVKVANyNAPTQS> 

16260 16280 16300 16320 

* * • • * • «r 

GTA ATT GCA GGC CCA ACA GCA ACT ACC GCT GAT GCG GCT AAA GCG CTA ACT GAG CTT GGT TAC AAA 
V I A G P TATTADAA KALTE LOY K> 

16340 16360 16380 

• • * « • • 

GCG ATT AAC CTG CCA GTA TCA GGT GCA TTC CAC ACT GAA CTT GTT GGT CAC GCT CAA GCG CCA TTT 
AINLPVSGAFHTBLVGKAQAPF> 

16400 16420 16440 

* • • • • • • 

GCT AAA GCG ATT GAC GCA GCC AAA TTT ACT AAA ACA AGC CGA GCA CTT TAC TCA AAT GCA ACT GGC 
AKAIDAAKFTKTSRALYSNATG> 

16460 16480 16500 16520 

» » ♦ ♦ • ♦ * 

GGA CTT TAT GAA AGC ACT GCT GCA AAG ATT AAA GCC TCG TTT AAG AAA CAT ATG CTT CAA TCA GTG 


31 
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GLYESTAAKIKASFKKKMLQSV> 

X6S40 16560 165B0 

* • • * • • 

COC TTT ACT AGC CAG CTA GAA GCC ATG TAC AAC GAC GGC GCC CCT GTA TTT GTT GAA TTT GOT CCA 
RFTSQLEAMYNDGARVFVEPCP> 

16600 16620 « 16640 

* • • • • ♦ • 

AAG AAC ATC TTA CAA AAA TTA GTT CAA GGC ACG CTT GTC AAC ACT OAA AAT GAA GTT TGC ACT ATC 
KNILQKLVQGTLVNTENEVCTI> 

16660 166B0 16700 

* • • » • * 

TCT ATC AAC CCT AAT CCT AAA GTT GAT AGT GAT CTG CAG CTT AAG CAA GCA GCA ATG CAG CTA GCG 
S INPNPKVOSDLQLKQAAMQLA> 

16720 16740 16760 16780 

* * • * • • • 

GTT ACT GGT GTG GTA CTC AGT GAA ATT CAC CCA TAC CAA GCC GAT ATT GCC GCA CCA GCG AAA AAG 
VTGVVLSBIDPY0AOIAAPAKK> 

16800 16820 16840 

m * * ♦ ♦ • • 

TCG CCA ATG AGC ATT TOG CTT AAT GCT GCT AAC CAT ATC AGC AAA GCA ACT CGC GCT AAG ATG GCC 
S PMS I S LNAANH I S KATRAKM A> 

16860 16880 16900 

» • « . • • ♦ 

AAG TCT TTA GAG ACA GGT ATC GTC ACC TCG CAA ATA GAA CAT GTT ATT GAA GAA AAA ATC GTT GAA 
KSLETGIVTSQIEHVIEEKIVE> 

16920 16940 16960 16980 

* ♦ ♦ * • • » 

CTT GAG AAA CTG GTT CAA GTC GAA AAG ATC GTC GAA AAA GTG GTT GAA GTA GAG AAA GTT GTT GAG 
VEKLVEVEKIVEKVVEVEKVVE> 

17000 17020 17040 

* . • , • ♦ 

GTT GAA GCT CCT GTT AAT TCA GTG CAA GCC AAT GCA ATT CAA ACC CGT TCA GTT GTC GCT CCA GTA 
VEAPVNSVQANAIQTRSVVAPV> 

17060 17080 17100 

» * • • • • • 

ATA GAG AAC CAA GTC GTG TCT AAA AAC AGT AAC CCA GCA GTC CAG AGC ATT AGT GGT GAT GCA CTC 
IENQVVSKNSKPAVQSISGDAL> 

17120 17140 17160 I^IBO 

• • * * • * • 

AGC AAC TTT TTT GCT GCA CAC CAG CAA ACC GCA CAG TTG CAT CAG CAG TTC TTA GCT ATT CCG CAG 
SNFFAAQQQTAQLHQQFLAI PQ> 

17200 17220 17240 

• , » • • * 

CAA TAT CGT GAG ACG TTC ACT ACG CTG ATG ACC GAG CAA GCT AAA CTG GCA AGT TCT GGT GTT GCA 
QYCETFTTLMTEQAKLASSGVA> 

17260 17280 17300 

• . • # • • • 

ATT CCA GAG AGT CTG CAA CGC TCA ATG GAG CAA TTC CAC CAA CTA CAA GCG CAA ACA CTA CAA AGC 
I PESLQRSMEQFHOLQAQTLQS> 

17320 17340 n360 

* • * « • * 

CAC ACC CAG TTC CTT GAG ATG CAA CCG GGT AGC AAC ATT GCA GCG TTA AAC CTA CTC AAT AGC AGC 
HTQFLEMQAGSNIAALNLLNSS> 

17380 17400 17420 17440 

• » • . . ♦ • 

CAA GCA ACT TAC GCT CCA GCC ATT CAC AAT GAA GCG ATT CAA AGC CAA GTG GTT CAA AGC CAA ACT 
QATYAPAIHNEAIQS0VVQSQT> 

17460 17480 17500 

GCA GTC CAG CCA GTA ATT TCA ACA CAA GTT AAC CAT GTG TCA GAG CAG CCA ACT CAA GCT CCA GCT 
AVQPVI STQVNHVSEQPTOAPA> 

17520 17540 17560 

• - * * ♦ * 

CCA AAA GCG CAG CCA GCA CCT GTG ACA ACT GCA GTT CAA ACT GCT CCG GCA CAA GTT GTT CGT CAA 
PKAQPAPVTTAVQ^TAPAQVVRO> 


17580 


17600 17620 17640 


GCC GCA CCA GTT CAA GCC GCT ATT GAA CCG ATT AAT ACA AGT GTT GCG ACT ACA ACG CCT TCA GCC 
AAPVQAAIEPINTSVATTTPSA> 


17660 


176B0 17700 
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TTC AGC CCC CAA ACA CCC CTG AGC CCA ACA AAA GTC CAA GCC ACT ATC CTT GAA CTG CTT OCT GAG 
FSAETALSATKVOATMLEVVAB> 

17720 17740 17760 

• • • • * 

AAA ACC OCT TAC CCA ACT GAA ATG CTA GAG CTT GAA ATG OAT ATG GAA GCC GAT TTA GGC ATC GAT 
KTGYPTBMLELEMDMEADLGID> 

17780 17800 17820 17840 

• • • . • 

TCT ATC AAG CGT OTA GAA ATT CTT GGC ACA CTA CAA GAT GAG CTA CCG GGT CTA CCT GAG CTT AGC 
SI KRVEILOTVQ0ELPGLPELS> 

17860 17880 17900 

• • * * • 

CCT GAA GAT CTA OCT GAG TOT CGA ACG CTA GGC GAA ATC CTT GAC TAT ATG GGC ACT AAA CTG CCG 
PEDLAECRTLGEIVDYMGSKLP> 

17920 17940 17960 

« • ♦ • . • 

GCT GAA GGC TCT ATG AAT TCT CAG CTG TCT ACA GGT TCC GCA GCT CCG ACT CCT OCA GCG AAT GGT 
AEGSMNSQLSTGSAAATPAANO 

17980 18000 18020 

« * * * • • 

CTT TCT GCC GAG AAA GTT CAA GCG ACT ATG ATG TCT GTG GTT GCC GAA AAG ACT GCC TAC CCA ACT 
LSAEKVQATMMSVVAEKTGYPT> 

18040 18060 1B080 18100 

• • • • • • ♦ 

GAA ATG CTA GAG CTT GAA ATG GAT ATG GAA GCC GAT TTA GGC ATA GAT TCT ATC AAG CGC GTT GAA 
BMLELEMDMEADLGIDS1KRVE> 

18120 18140 18160 

• • . . • ♦ ♦ 

ATT CTT GGC ACA CTA CAA GAT GAG CTA CCG GGT CTA CCT GAG CTT AGC CCT GAA GAT CTA GCT GAG 
ILOTVQOELPCLPELSPEDLAB> 

18180 18200 18220 

« * • • * * 

TCT CGT ACT CTA GGC GAA ATC GTT GAC TAT ATG AAC TCT AAA CTC GCT GAC GGC TCT AAG CTG CCG 
CRTLG BIVOyMMS KLADGSKLP> 

18240 18260 1B280 18300 

* • * • • * ♦ 

GCT GAA GGC TCT ATG AAT TCT CAG CTG TCT ACA ACT GCC GCA GCT GCG ACT CCT GCA GCG AAT GGT 
AEGSMNSQLSTSAAAATPAAMG> 

18320 18340 18360 

• #♦••• 
CTC TCT GCG GAG AAA GTT CAA GCG ACT ATG ATG TCT GTG CTT GCC GAA AAG ACT GGC TAC CCA ACT 
LSAEKVQATMMSVVAEKTGYPT> 

18380 18400 18420 

GAA ATG CTA GAA CTT GAA ATG GAT ATG GAA GCT GAC CTT GGC ATC GAT TCA ATC AAG CCC GTT GAA 
EMLELEMDMEADL G1DSIKRVE> 

18440 IB460 18480 18500 

* ♦ • . • * 

ATT CTT GGC ACA CTA CAA GAT GAC CTA CCG GOT TTA CCT CAG CTA AAT CCA GAA GAT TTG GCA GAC 
ILGTVQDBLPGLPELMPEDLAE> 

18520 18540 18560 

• , . • • * 

TCT CGT ACT CTT CGC GAA ATC GTG ACT TAT ATG AAC TCT AAA CTC GCT GAC GCC TCT AAG CTC CCA 
CRTLGEIVTYMNSKLADCSKLP> 

18580 18600 18620 

♦ « • • . » • 

CCT GAA GGC TCT ATG CAC TAT CAG CTG TCT ACA ACT ACC GCT CCT GCG ACT CCT CTA GCG AAT GGT 
AEGSMHYOLSTSTAAATPVAMO 

18640 18660 18880 

• * • • • • 

CTC TCT CCA CAA AAA GTT CAA GCG ACC ATG ATC TCT CTA CTT CCA CAT AAA ACT CGC TAC CCA ACT 
LSAEKVOATMMSVVADKTGYPT> 

18700 18720 18740 18760 

♦ • • • • * • 

GAA ATG CTT GAA CTT GAA ATC GAT ATG GAA GCC GAT TTA CGT ATC GAT TCT ATC AAG CCC GTT GAA ^, — . 
EMLBLEMDMBADLG1DSIKRVE> \ 

18780 18800 18820 J 

• • • 


ATT CTT GGC ACA CTA CAA GAT GAC CTA CCC CCT TTA CCT CAG CTA AAT CCA CAA GAT CTA CCA GAG 
ILGTVQDBLPGLPBLHPEDLAE> 

16840 18860 18880 
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20020 20040 20060 20080 

• « * • • • * 

GTT AGC AAT GCG TTC TTG TGG GCC AAA TTA TTG CAA CCA AAG CTC GTT OCT GGA CCA GAT GCG CGT 
VSNAFLWAKLLQPKLVACAOAR> 

20100 20120 20140 

« * , • ♦ ♦ • 

CGC TCT TTT GTA ACA GTA AGC CGT ATC GAC GOT GGC TTTV GGT TAC CTA AAT ACT GAC GCC CTA AAA 
RCFVTVSRIDGGFOYLNTDALK> 

20160 20180 20200 

• ♦ • * * ♦ 

CAT GCT GAG CTA AAC CAA GCA GCA TTA OCT GGT TTA ACT AAA ACC TTA AGC CAT GAA TGG CCA CAA 
DAELNQAALACLTKTLSKEWPQ> 

20220 20240 20260 20280 

« • ft • ♦ * * 

GTO TTC TGT CGC GCG CTA GAT ATT GCA ACA GAT GTT GAT GCA ACC CAT CTT GCT GAT GCA ATC ACC 
VFCRALDIATDVDATHLAOAIT> 

20300 20320 20340 

* • • ♦ * • 

AGT GAA CTA TTT GAT AGC CAA OCT CAG CTA CCT GAA GTG GGC TTA AGC TTA ATT GAT GGC AAA GTT 
SELFDSQAQLPEVGLSLIDGKV^ 

20360 20360 20400 

« * • ♦ • 

AAC CGC GTA ACT CTA GTT OCT OCT GAA GCT GCA GAT AAA ACA GCA AAA GCA GAG CTT AAC AGC ACA 
N RVT LVAAEAAD KTA KA ELN S T> 

20420 20440 20460 20480 

« « • ♦ • • * 

GAT AAA ATC TTA GTG ACT GGT COG GCA AAA QGG GTG ACA TTT GAA TCT GCA CTG GCA TTA GCA TCT 

DKI LVTGGAKGVT FECALALAS> 

20500 20520 20540 

* « * ft • • 

CGC AGC CAG TCT CAC TTT ATC TTA OCT GGG CGC AGT GAA TTA CAA GCT TTA CCA AGC TGG GCT GAG 
RSOSHFILAGRSELQALPSWAE> 

20560 20580 20600 

• * * • ♦ * * 

GGT AAG CAA ACT AGC GAG CTA AAA TCA GCT OCA ATC GCA CAT ATT ATT TCT ACT GGT CAA AAG CCA 
GKQTSELKSAA1AHIISTGQKP> 

20620 20640 20660 

« * • , ♦ ♦ 

ACG CCT AAG CAA GTT GAA GCC GCT GTG TGG CCA GTG CAA AGC AGC ATT GAA ATT AAT GCC GCC CTA 
TPKQVEAAVWPVQSSIEINAAL> 

20680 20700 20'720 20740 

GCC GCC TTT AAC AAA GTT GGC GCC TCA GCT GAA TAC GTC ACC ATG GAT GTT ACC OAT AGC GCC GCA 
AAFNKVGASAEYVS«DVTDSAA> 

20760 20780 2CB00 

ft • . ♦ ft • • 

ATC ACA GCA GCA CTT AAT GGT CGC TCA AAT GAG ATC ACC CGT CTT ATT CAT GGC CCA GGT GTA CTA 
ITAALNGRSNEITCLIHGAGVL> 

20820 20840 20860 

♦ . . ft ft « 

GCC GAC AAG CAT ATT CAA GAC AAG ACT CTT GCT GAA CTT CCT AAA GTT TAT GGC ACT AAA GTC AAC 
ADKH IQDKTLAELAKVYaTKVN> 

20880 20900 20920 20940 

• ft ft • * ft * 

GGC CTA AAA GCG CTG CTC GCG GCA CTT GAG CCA AGC AAA ATT AAA TTA CTT GCT ATG TTC TCA TCT 
GLKALLAALEPSKIKLLAMFSS> 

20960 20980 21000 

• « • • • ♦ 

CCA GCA GGT TTT TAC CGT AAT ATC GGC CAA AGC CAT TAC GCC ATG TCG AAC GAT ATT CTT AAC AAG 
AACrYCHIG0SDYAMSNOlLHK> 

21020 21040 21060 

• • • ♦ ♦ • ♦ 

GCA GCG CTG CAG TTC ACC GCT CGC AAC CCA CAA GCT AAA GTC ATG AGC TTT AAC TGG CGT CCT TGG 
AALQFTARtlPQAKVMSFNWGPW> 

21080 21100 • 21120 21140 

• • * • • • • 

GAT GCC CGC ATG GTT AAC CCA GCG CTT AAA AAG ATG TTT ACC GAG CGT CCT GTG TAC GTT ATT CCA 
D0GMVNPALKKMFTERGVYVIP> 

21160 21180 21200 

* • • • • • 

CTA AAA CCA GCT CCA GAG CTA TTT GCC ACT CAG CTA TTG GCT GAA ACT GGC GTG CAG TTG CTC ATT 
LKACABLFATOLLAETGVQLl.I> 


%4 
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21220 21240 21260 

GGT ACG TCA ATO CAA GGT CGC ACC CAC ACT AAA GCA ACT GAG ACT GCT TOT GTA AAA AAO CTT AAT 
GTSMQGCSOTKATETASVKKLM> 

21280 21300 21320 

• * * ♦ • * 

GCG GGT GAG GTG CTA ACT GCA TCG CAT CCG CGT GCT GGT GCA CAA AAA ACA CCA CTA CAA GCT GTC 
AGEVLSASHPRAGAQKTPLOAV> 

21340 21360 21380 21400 

« • * • • * • 

ACT OCA ACG CGT CTG TTA ACC CCA AGT GCC ATG GTC TTC ATT GAA GAT CAC CGC ATT GGC GGT AAC 
TATRLLTPSAMVriBDHR10GN> 

21420 21440 21460 

• • ♦ • ♦ * • 

ACT GTO TTG CCA ACG GTA TGC GCC ATC GAC TGG ATG CGT GAA GCG GCA AGC GAC ATG CTT GGC GCT 
SVLPTVCAlDWMREAASOMtGA> 

21480 21500 21520 

• • • • • • 

CAA CTT AAG GTA CTT GAT TAC AAG CTA TTA AAA GGC ATT GTA TTT GAG ACT GAT GAG CCG CAA GAG 
QVKVLDYKLt.KGIVFETDEPQE> 

21540 21560 21580 21600 

* ♦ • * • * • 

TTA ACA CTT GAG CTA ACG CCA GAC GAT TCA GAC GAA GCT ACQ CTA CAA GCA TTA ATC AGC TCT AAT 
LTLELTPDDSDEATL0AL1SCN> 

21620 21640 21660 

* • • • • . * 

OQG CGT CCG CAA TAC AAG GCG ACG CTT ATC AGT GAT AAT GCC GAT ATT AAG CAA CTT AAC AAG CAG 
GRPQYKATLISDNADIKQLNKQ> 

21680 21700 21720 

• • • • ♦ • 

TTT GAT TTA AGC GCT AAG CCG ATT ACC ACA GCA AAA GAG CTT TAT AGC AAC GGC ACC TTG TTC CAC 
FD. LSAKAITTAKELYSNGTLFH> 

21740 21760 21780 21800 

• • 4 * • • • 

GOT CCG COT CTA CAA GGG ATC CAA TCT GTA CTG CAG TTC GAT OAT CAA GGC TTA ATT GCT AAA GTC 
OPRLQGIQSVVQFDDQGLIAKV> 

21820 21840 21B60 

• . • • ♦ • 

OCT CTG CCT AAC GTT GAA CTT AGC CAT TGT GGT GAG TTC TTG CCG CAA ACC CAC ATG CGT GGC AGT 
ALPKVELSDCCErLPQTHMGGS> 

21880 21900 21920 

« , * • « . • 

CAA CCT TTT GCT GAG GAC TTG CTA TTA CAA GCT ATG CTG GTT TGG GCT CGC CTT AAA ACT GGC TCG 
QPFAEDLLI.QAMLVWARLKTGS> 

21940 21960 21980 

. • ♦ * • • 

GCA AGT TTG CCA TCA AGC ATT GGT GAG TTT ACC TCA TAC CAA CCA ATG GCC TTT GCT GAA ACT GGT 
ASI-PSS1GEPTSYQPMAFGETG> 

22000 22020 22040 22060 

» ♦ ♦ . * • » 

ACC ATA GAG CTT GAA GTG ATT AAG CAC AAC AAA CGC TCA CTT GAA GCG AAT GTT GCG CTA TAT CGT 
TIELBVIKHNKRSLBANVALYR> 

22080 22100 22120 

GAC AAC GCC GAG TTA AGT GCC ATG TTT AAG TCA GCT AAA ATC ACC ATT AGC AAA AGC TTA AAT TCA 
DNCELSAMFKSAKIT1SKSLIIS> 

22140 22160 22180 22200 

* * • ♦ • * • 

CCA TTT TTA CCT GCT GTC TTA OCA AAC GAC AGT GAG GCG AAT TAOTCGA ACAAACGCCT AAAGCTAGTG 
APLPAVLANDSEAN> 

22220 22240 22260 

• * ♦ ♦ w ♦ 

CO ATG CCG CTG CGC ATC GCA CTT ATC TTA CTG CCA ACA CCG CAG TTT GAA GTT AAC TCT GTC GAC a 
MPLRIALILLPTPQFEVNSVO> 

22280 22300 22320 


CAG TCA GTA TTA GCC AGC TAT CAA ACA CTG CAG CCT GAG CTA AAT GCC CTG CTT AAT AGT GCG CCG llU*' 
QSVLASY()TLQPELNALLNSAP> J 


Q 

22340 22360 22380 

♦ 

ACA CCT GAA ATG CTC AGC ATC ACT ATC TCA GAT GAT AGC OAT GCA AAC ACC TTT GAC TCG CAG CTA 
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TPEMLSITZSDD5DANSPESQL> 

22400 22420 22440 22460 

* • • * • • • 

AAT GCT GCG ACC AAC GCA ATT AAC AAT CGC TAT ATC GTC AAG CTT GCT ACG GCA ACT CAC GCT TTO 
NAATNAINNGYIVKLATATHAL> 

22480 22S0O « 22520 

* • * • • * 

TTA ATG CrC GOT GCA TTA AAA GCG GCG CAA ATG CGG ATC CAT CCT CAT GCG CAG CTT GCC GCT ATG 
LMLPALKAAQMRIHPHAQLAAM> 

22540 22560 22S80 

* • * • • • ♦ ♦ 

CAG CAA GCT AAA TCG ACG CCA ATG AGT CAA GTA TCT GGT GAG CTA AAG CTT CGC GCT AAT GCG CTA 
QQAKSTPMSQVSGELKLGAMAL> 

22600 22620 22640 22660 

* ft * • • • • 

AGC CTA GCT CAG ACT AAT GCG CTG TCT CAT GCT TTA AGC CAA GCC AAG CGT AAC TTA ACT GAT GTC 
SLAQTNALSHALSQAKRNLTDV> 

22680 22700 22720 

ft ft w • * • 

AGC GTG AAT GAG TCT TTT GAG AAC CTC AAA AGT GAA CAG CAG TTC ACA GAG GTT TAT TCG CTT ATT 
SVMECFEHLKSEQQFTEVYSLI> 

22740 22760 22780 

* - * • • • • • 

CAG CAA CTT. GCT AGC CGC ACC CAT GTG AGA AAA GAG GTT AAT CAA GGT GTG GAA CTT GGC CCT AAA 
QQLASRTHVRKEVNQGVEI*GPK> 

22800 22820 22840 

» ■ • • • • ♦ 

CAA GCC AAA AGC CAC TAT TGG TTT AGC GAA TTT CAC CAA AAC CGT GTT GCT GCC ATC AAC TTT ATT 
QAKSHYWFSBFHONRVAAINFI> 

22860 22880 22900 22920 

* * • • ft • • 

AAT GGC CAA CAA CCA ACC AGC TAT GTG CTT ACT CAA GGT TCA GGA TTG TTA GCT GCG AAA TCA ATG 
KGQQATSyVLTQOSGLLAAKSM> 

22940 22960 22980 

* • ft • • ft • 

CTA AAC CAC CAA AGA TTA ATG TTT ATC TTG CCG GGT AAC AGT CAG CAA CAA ATA ACC GCA TCA ATA 
LNQQRLMFILPGNSQQQITASl?' 

23000 23020 23040 

♦ • * * « * 

ACT CAG TTA ATG CAG CAA TTA GAG CGT TTG CAG GTA ACT GAG GTT AAT GAG CTT TCT CTA GAA TGC 
TQLMQQLERLQVTEVMELSLEO 

23060 23080 23100 23120 

* * ft • ft • * 

CAA CTA GAG CTG CTC AGC ATA ATG TAT GAC AAC TTA GTC AAC GCA GAC AAA CTC ACT ACT CGC GAT 
QLELLSIMYDNLVNADKLTTRD?" 

23140 23160 23180 

ft * • * * • 

AGT AAG CCC GCT TAT CAC GCT GTG ATT CAA GCA AGC TCT GTT AGC GCT GCA AAG CAA GAG TTA AGC 
SKPAYQAVIQASSVSAAKQELS> 

23200 23220 23240 

* ♦ ft • ft • * 

GCG CTT AAC GAT GCA CTC ACA GCG CTG TTT GCT GAG CAA ACA AAC GCC ACA TCA ACQ AAT AAA GGC 
ALNDALTALFAEQTNATSTHKO 

23260 23280. 23300 23320 

ft ♦ ft ' ft ft * • 

TTA ATC CAA TAC AAA ACA CCG GCG GGC AGT TAC TTA ACC CTA ACA CCG CTT GGC AGC AAC AAT GAC 
LIQYKT PAOSYLTI»TPLGSM MD> 

23340 23360 23380 

* ft ft^ • ft » 

AAC GCC CAA GCG GGT CTT GCT TTT GTC TAT CCG GGT GTG GGA ACG GTT TAC GCC GAT ATG CTT AAT * 
NAOAGLArVYPGVGTVYADMLNs- 

23400 23420 23440 

* ft ft ft * • * 

GAG CTG CAT CAG TAC TTC CCT GCG CTT TAC GCC AAA CTT GAG CGT GAA GGC GAT TTA AAG GCG ATG 
ELHQYFPALYAKL*BREGDLKAM> 

23460 23480 23500 

• ft « . ft » 

CTA CAA GCA CAA GAT ATC TAT CAT CTT CAC CCT AAA CAT CCT GCC CAA ATC ACC TTA GGT GAC TTA 
LQAEDI YHLDPKHAAQMSLGDL* 

23520 23540 23560 23580 


Wa.tf 
(8B0 
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GCC ATT OCT CGC GTO GGG AGC AGC TAC CTG TTA ACT CAG CTG CTC ACC GAT GAG TTT AAT ATT AAG 
;vij^GVGSSYLLTQLLTDEFNIR> 

23600 23620 23640 

• * • * • • • 

CCT AAT TTT GCA TTA CGT TAC TCA ATG GGT GAA OCA TCA ATG TGG GCA AGC TTA GGC GTA TGG CAA 
PNFALGYSMGEASMWASLGVWQ> 

" ♦ 

23660 236B0 23700 

* * • * • 

AAC CCG CAT CCG CTG ATC AGC AAA ACC CAA ACC GAC CCG CTA TTT ACT TCT GCT ATT TCC GGC AAA 
NPHALISKTQT DPI.FTSAISGK> 

23720 23740 23760 23780 

* • • • • • • 

TTG ACC CCG GTT AGA CAA GCT TGG CAG CTT OAT GAT ACC GCA GCG GAA ATC CAG TGG AAT AGC TTT 

LTAVRQAWOI-DDTAAEIQWN.SF> 

2380O 23820 23840 

• • • • 

GTG GTT AGA AGT GAA GCA GCG CCG ATT GAA GCC TTG CTA AAA CAT TAC CCA CAC GCT TAC CTC GCG 
VVRS EAAPIEALLKDYPHAYLA> 

23860 23880 23900 

« • * ♦ • • ♦ 

ATT ATT CAA GGG GAT ACC TGC GTA ATC GCT GGC TGT GAA ATC CAA TGT AAA GCG CTA CTT GCA CCA 
IIQGDTCVIAGCEIQCK ALLAA> 

23920 23940 23960 23980 

• ♦ • • 

CTG GGT AAA CGC GGT ATT GCA GCT AAT CGT GTA ACG GCG ATG CAT ACG CAG CCT GCG ATG CAA GAG 
LOKRGIAANRVTAMHTQPAMQE> 

24000 24020 24040 

• » * • • ♦ 

CAT CAA AAT GTG ATG GAT TTT TAT CTG CAA CCG TTA AAA GCA GAG CTT CCT AGT GAA ATA AGC TTT 
HQMVMDFYLQPLKAELPSEISF> 

24060 24080 24100 

♦ * . • * • • 

ATC AGC GCC GCT GAT TTA ACT GCC AAG CAA ACG GTG AGT GAG CAA GCA CTT AGC AGC CAA GTC GTT 
ISAADLTAKQTVSEQALSSQVV> 

24120 24140 24160 

♦ * * ♦ ♦ « 

GCT CAG TCT ATT GCC GAC ACC TTC TGC CAA ACC TTG GAC TTT ACC GCG CTA GTA CAT CAC GCC CAA 
AQS 1ADTFCQTLDPTALVHHAQ> 

24160 24200 24220 24240 

• « • . • ♦ « 

CAT CAA CGC GCT AAG CTG TTT GTT GAA ATT GCC GCG GAT AGA CAA AAC TGC ACC TTG ATA GAC AAG 
HQGAKLFVEI0ADRQKCTL1DK> 

24260 24280 24300 

ATT CTT AAA CAA GAT GGT GCC AGC AGT GTA CAA CAT CAA CCT TGT TGC ACA GTG CCT ATG AAC GCA 
IVKQDGA SSV QHQPCCTVPMNA> 

24320 24340 24360 

« • • * ♦ • 

AAA OCT AGC CAA GAT ATT ACC AGC GTG ATT AAA GCG CTT GGC CAA TTA ATT ACC CAT CAG GTO CCA 
KGSQD1TSVIKALG0LISHQVP> 

24380 24400 24420 24440 

• • • • ♦ • • 

TTA TCC GTG CAA CCA TTT ATT GAT CGA CTC AAC CGC GAG CTA ACA CTT TGC CAA TTG ACC AGC CAA 
LSVQPF1DGLKRELTLC0I'TSQ> 

24460 24480 24S00 

• ft • « * 

CAG CTG GCA GCA CAT GCA AAT GTT GAC AGC AAG TTT GAG TCT AAC CAA GAC CAT TTA CTT CAA GGO 
QLAAHANVDSKPESHQDHLIiQG> 

24520 24540 24S60 

• * , • • ♦ • 

GAA OTC TA ATG TCA TTA CCA GAC AAT GCT TCT AAC CAC CTT TCT GCC AAC CAG AAA GGC GCA TCT 
E V> 

SLPDNASMHLSANQKGAS> 


24580 24600 24620 24640 

* « • w , * ♦ 

CAG GCA AGT AAA ACC ACT AAG CAA AGC AAA ATC GCC ATT GTC GGT TTA GCC ACT CTG TAT CCA CAC 
QASKTSKQSKIAIVGLATLYPD> 


^8 


24660 24680 24700 J 


GCT AAA ACC CCG CAA GAA TTT TGG CAG AAT TTG CTC GAT AAA CGC GAC TCT CGC AGC ACC TTA ACT 
AKTPOEFWONLLDKRDSRSTLT> 
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24720 24740 24760 

« • • » • • 

AAC CAA AAA CTC GGC GOT AAC AOC CAA GAT TAT CAA GOT GTG CAA GGC CAA TCT GAC CGT TTT TAT 
NEKLGANSQDyOCVQOQSDRFY> 


247B0 


24800 34820 


TGT AAT AAA GGC GGC TAG ATT GAG AAC TTC AGO TTT AATiGCT GCA GGC TAG AAA TTG CCG GAG CAA 
CNKGGYIEMFSFMAAGYKLPEQ> 

24840 24860 24880 24900 

• • • . ♦ • * * 

AGO TPA AAT GGC TTG GAC GAC AGC TTC CTT TG6 GCC CTC GAT ACT AGC CGT AAC GCA CTA ATT GAT 
SLNGLDDSFLWALOTSRNALID> 

24920 24940 24960 

CCT GGT ATT GAT ATC AAC GGC GCT GAT TTA AGC CGC OCA GGT GTA GTC ATG GGC GCG CTG TCG TTC 
AGIOINGADLSRAGVVMGALSF> 

24980 25000 25020 

♦ * • ♦ • • 

CCA ACT ACC CGC TCA AAC GAT CTG TTT TTG CCA ATT TAT CAC AGC GCC GTT GAA AAA GCC CTG CAA 
PTTRSIJ0LFLPIYHSAVEKALQ> 

25040 25060 25080 25100 

» * • * • • 

GAT AAA CTA GGC GTA AAG GCA TTT AAG CTA AGC CCA ACT AAT GCT CAT ACC GCT CGC GCG GCA AAT 
DKLGVKAFKLSP TNAHTARAAN> 

25120 25140 25160 

GAG AGC AGC CTA AAT GCA GCC AAT GGT GCC ATT GCC CAT AAC AGC TCA AAA GTG GTG GCC GAT GCA 
ESSLNAANGAIAKNSSKVVADA> 

25180 25200 25220 

CTT GGC CTT GGC GGC GCA CAA CTA AGC CTA GAT GCT GCC TGT GCT AGT TCG GTT TAC TCA TTA AAG 
LGLGGAQLSLDAACASSVYSLK> 

25240 25260 25280 25300 

CTT GCC TGC GAT TAC CTA ACC ACT GGC AAA GCC GAT ATC ATG CTA GCA GGC GCA GTA TCT GGC GCG 
1,ACDYLSTGKAD1MLACAVS0A> 

25320 25340 25360 

GAT CCT TTC TTT ATT AAT ATG GGA TTC TCA ATC TTC CAC GCC TAC CCA GAC CAT GGT ATC TCA GTA 
DPFFINMGFSIFHAYPDHGISV> 

25380 25400 25420 

♦ * * * . • 

CCG TTT GAT GCC AGC AGT AAA GGT TTG TTT GCT GGC GAA GCC GCT GGC GTA TTA GTG CTT AAA CGT 
PPDASSKGLFAGEGAGVLVLKR> 

25440 25460 25480 

• . • ♦ 

CTT GAA GAT GCC GAG CGC CAC AAT GAC AAA ATC TAT GCG GTT GTT AGC GGC GTA GGT CTA TCA AAC 
LEDAERDNDKIYAVVSGVGLSN> 

25500 2S520 25540 25560 

GAC GGT AAA GGC CAG TTT GTA TTA AGC CCT AAT CCA AAA GGT CAC GTG AAG GCC TTT GAA CGT GCT 
DGKGQFVLSPflPKOQVKAFERA> 

25580 2S600 25620 

TAT GCT GCC AGT GAC ATT GAG CCA AAA GAC ATT GAA CTG ATT GAG TGC CAC GCA ACA GGC ACA CCG 
yAASDIEPKDIEVlECHATGTP> 

25640 25660 25680 

• • • • ♦ * • 

CTT GGC GAT AAA ATT GAG CTC ACT TCA ATG GAA ACC TTC TTT GAA GAC AAG CTG CAA GGC ACC GAT 
LODKIELTSMETFrEDKLQGTD> 

25700 25720 25740 25760 

* * • , • • • 

GCA CCG TTA ATT GGC TXTA GCT AAG TCT AAC TTA GGC CAC CTA TTA ACT GCA GCG CAT GCG GGG ATC 
APLIGSAKSNLGHLLTAAHAGI> 

25780 25800 • 25820 

• * ♦ • • • 

ATG AAG ATG ATC TTC GCC ATG AAA GAA GGT TAC CTG CCG CCA AGT ATC AAT ATT AGT GAT GCT ATC 
MKMIFAMKEGYL PPS1MISDAI> 

25840 25860 258B0 

^ • * • • • • 

GCT TCG CCG AAA AAA CTC TTC GGT AAA CCA ACC CTG CCT ACC ATG GTT CAA GGC TGQ CCA GAT AAG 
ASPKKLFGKP- - - «.-.»nK> 


wo 98/55625 ^5 / 106 PCT/US5 

25900 25920 25940 2S960 

* * • * ♦ • * , • 

CCA TCG AAT AAT CAT TTT GOT GTA AGA ACC OCT CAC GCA GGC GTA TCG GTA TTT CGC TTT GGT GGC 
PSNMHPGVRTRHAGVSVPGFGO 


25960 2€000 26020 

* ♦ • ^ • • • 

TOT AAC GCC CAT CTG TTG CTT GAG TCA TAC AAC GGC AAA GOA ACA GTA AAG GCA CAA GCC ACT CAA 
CNAHLLLESYNGKGTVKABATQ> 

26040 26060 26080 

« • * # ♦ • • 

GTA CCG CGT CAA GCT GAG CCG CTA AAA GTG GTT GGC CTT GCC TCG CAC TTT GGG CCT CTT AGC AGC 
VPRQAEPLKVVGI.ASHFGPLSS> 

26100 26120 26140 

* • • • * • 

ATT AAT GCA CTC AAC AAT GCT GTG ACC CAA GAT GGG AAT GGC TTT ATC GAA CTG CCG AAA AAG CGC 
XNALNNAVTQDGNGPIELPKKR> 

26160 261B0 26200 26220 

* • • « . * * • 

TGG AAA GGC CTT GAA AAG CAC AGT GAA CTG TTA GCT GAA TTT GGC TTA GCA TCT GCG CCA AAA GGT 
WKGLEKHS£LLABPOLASAPKG> 

26240 26260 26280 

* • * • « • • 

GCT TAT GTT GAT AAC TTC GAG CTG GAC TTT TTA CGC TTT AAA CTG CCG CCA AAC CAA GAT GAC CGT 
AVVDNPELDFLRFKLPPKEDDR> 

26300 26320 26340 

* * * « * * 

TTG ATC TCA CAG CAG CTA ATG CTA ATG CGA CTA ACA GAC GAA GCC ATT CGT GAT GCC AAG CTT GAG 
LZSQQLMLHRVTDEAIRDAKLB> 

26360 26380 26400 26420 

* * • ♦ • * • 

CCG GCG CAA AAA GTA GCT GTA TTA GTG GCA ATG GAA ACT GAG CTT GAA CTG CAT CAG TTC CGC GGC 
PGQKVAVLVANETBIiELHQFRG> 

26440 26460 26480 

* * * • • * 

OGG GTT AAC TTG CAT ACT CAA TTA GCG CAA AGT CTT GCC GCC ATG GGC GTG AGT TTA TCA ACG GAT 
RVNL HTQLAQSLAAHGVSLSTO 

26500 26520 26540 

« « • • # * * 

GAA TAC CAA GCG CTT GAA GCC ATC GCC ATG GAC AGC GTG CTT GAT GCT GCC AAG CTC AAT CAG TAC 
EYQALEAZAMDSVLDAAKLNQY> 

26560 26580 26600 26620 

* ♦ ♦ • « • • 

ACC AGC TTT ATT GGT AAT ATT ATG GCG TCA CGC GTG GCG TCA CTA TGG GAC TTT AAT GGC CCA GCC 
TSFIGNIMASRVASLMDFNGPA> 

26640 26660 26680 

* • • « * • 

TTC ACT ATT TCA GCA GCA CAG CAA TCT GTG AGC CGC TGT ATC GAT GTG GCG CAA AAC CTC ATC ATG 
FTISAA.. EQSVSRCIDVAQNLIM> 

26700 26720 26740 

♦ • • • • • • 

GAG GAT AAC CTA GAT GCG GTG GTG ATT GCA GCG GTC GAT CTC TCT GGT AGC TTT GAG CAA GTC ATT 
£DNLDAVVIAAVOLSGSFEQVI> 

26760 26780 26800 

* • * « * * 

CTT AAA AAT GCC ATT GCA CCT GTA GCC ATT GAG CCA AAC CTC GAA GCA AGC CTT AAT CCA ACA TCA 
LKNAIAPVAZEPNLEASLNPTS> 

26820 26840 26860 26880 

* # * • * • • 

GCA AGC TGG AAT GTC GGT GAA GGT GCT GGC GCG GTC GTG CTT GTT AAA AAT GAA GCT ACA TCG GGC 
ASW(7VGECAGAVVLVKNEATS0> 

26900 26920 26940 

* * • « • • • 

TGC TCA TAC GGC CAA ATT GAT GCA CTT GGC TTT GCT AAA ACT GCC GAA ACA GCG TTG GCT ACC GAC 
CSYGQIOALGFAKTAETALATD> 

26960 26980 * 27000 

* • • » • * 

AAG CTA CTG AGC CAA ACT GCC ACA GAC TTT AAT AAG GTT AAA GTG ATT GAA ACT ATG GCA GCG CCT 
KLLSQTATOFNKVKVIETMAAP> 

27020 27040 27060 27080 

• • * » * • * 

GCT AGC CAA ATT CAA TTA GCG CCA ATA GTT AGC TCT CAA GTG ACT CAC ACT GCT GCA GAG CAG CCT 
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ASOIQLAPIVSSQVTHTAAEQR> 

27100 27120 27140 

* • • ♦ • » 

GTT GOT CAC TGC TTT GCT GCA GCG GGT ATG GCA AGC CTA TTA CAC GGC TTA CTT AAC TTA AAT ACT 
VGHCFAAAGMASLLHGLtNLNT> 

27160 27160 » 27200 

* * ♦ * • « • 

GTA GCC CAA ACC AAT AAA GCC AAT TGC GCG CTT ATC AAC AAT ATC ACT GAA AAC CAA TTA TCA CAG 
VAQTNKANCALINNI SEMQLSQ> 

27220 27240 27260 27280 

• • • • * « • 

CTC TTG ATT AGC CAA ACA GCC ACC GAA CAA CAA GCA TTA ACC GCG CGT TTA AGC AAT GAG CTT AAA 
LLISQTASEQQALTARLSNELK> 

27300 27320 27340 

• * * » ♦ * 

TCC GAT GCT AAA CAC CAA CTG GTT AAG CAA CTC ACC TTA GGT GGC CGT GAT ATC TAC CAG CAT ATT 
S0AKH0LVKQVTLGGRDIYQHI> 

27360 27380 27400 

* * • • • * • 

GTT GAT ACA OCG CTT GCA AGC CTT GAA AGC ATT ACT CAG AAA TTG GCG CAA GCG ACA GCA TCG ACA 
VDTPLASLESITQKLAQATAST> 

27420 27440 27460 

• 4 • ft * • 

. GTG CTC AAC CAA GTT AAA CCT ATT AAG GCC GCT GGC TCA GTC GAA ATG GCT AAC TCA TTC GAA ACG 
VVNQVKPlKAAGSVEMANSrBT> 

27480 27SO0 27520 27540 

* * • « ♦ • ♦ 

GAA AGC TCA GCA GAG CCA CAA ATA ACA ATT GCA GCA CAA CAG ACT CCA AAC ATT GGC GTC ACC GCT 
ESSAEPQITIAAQQTA11IGVTA> 

27560 27580 27600 

• * • * « « * 

CAG GCA ACC AAA CGT GAA TTA GGT ACC CCA CCA ATG ACA ACA AAT ACC ATT CCT AAT ACA OCA AAT 
QATKRELOTPPMTTMTIANTAN> 

27520 27640 27660 

• • . * • ♦ * 

AAT TTA GAC AAG ACT CTT GAG ACT GTT GCT GGC AAT ACT GTT GCT AGC AAC GTT GGC TCT GGC GAC 
NLDKTLETVAGHTVASKVOSGD> 

27680 27700 27720 27740 

* * • « • • • 

ATA GTC AAT TTT CAA CAG AAC CAA CAA TTG GCT CAA CAA GCT CAC CTC GCC TTT CTT GAA AGC CGC 
IVNFQQNQQLAQOAHLAFLGSR> 

27760 27780 27800 

• » • « ♦ * 

ACT GCG GGT ATC AAG GTG GCT GAT GCT TTA TTG AAG CAA CAG CTA GCT CAA GTA ACA GGC CAA ACT 
SAGMXVAOALLKQQLAQVTGQT> 

27B20 27840 27860 

* « ♦ * ♦ • » 

ATC GAT AAT CAG GCC CTC GAT ACT CAA GCC GTC GAT ACT CAA ACA AGC GAG AAT GTA GCG ATT GCC 
IDNQALDTQAV0TQTSENVAIA> 

27880 27900 27920 27940 

• • * * • ♦ • 

GCA GAA TCA CCA GTT CAA GTT ACA ACA CCT GTT CAA GTT ACA ACA CCT GTT CAA ATC ACT GTT GTG 
AES PVQVTTPVQVTTPVQrSVV> 

27960 27980 2B0OO 

• * • ♦ • •* 

GAG TTA AAA CCA GAT CAC GCT AAT GTG CCA CCA TAC ACC CCG CCA GTG CCT GCA TTA AAG CCG TCT 
BUKPDHANV PPYTPPVPALKPC> 

28020 28040 28060 

* • * w • • • 

ATC TGC AAC TAT GCC GAT TTA CTT CAC TAC GCA GAA GGC GAT ATC GCC AAG GTA TTT GGC ACT GAT 
IWNYADLVEYAEGD1AKVF0SI» 

28080 28100 28120 

* • • • • * 

TAT GCC ATT ATC GAC AGC TAC TCG CGC CGC GTA CGT CTA CCG ACC ACT GAC TAC CTG TTG GTA TCG 
YA 1 I DSYSRRVRL*PTTDYLLVS> 

28140 28160 28180 28200 

« • • * * * • 

CGC CTG ACC AAA CTT GAT GCG ACC ATC AAT CAA TTT AAG CCA TGC TCA ATG ACC ACT GAG TAC GAC 
RVTKLDATINQPKPCSMTTEYD> 

28220 28240 28260 
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ATC CCT GTT GAT GCG CCC TAG TTA OTA GAC GGA CAA ATC CCT TOO GCG GTA OCA GTA GAA TCA GGC 
I pVDAPYt'VDGOIPWAVAVESG> 

26260 2B3O0 28320 

w * « • » * 

CAA TGT OAC TTG ATO CTT ATT AGC TAT CTC GGT ATC OAC TTT GAG AAC AAA GGC GAG CGC GTT TAT 
QCDLMLISYl.GIDFBMKGERVY> 

4 

28340 28360 28380 28400 

• • « • ♦ • • 

CGA CTA CTC GAT TOT ACC CTC ACC TTC CTA GGC GAC TTG CCA CGT GGC GGA GAT ACC CTA CGT TAC 
RLL DCTLTF LGDLPRGGOTLRY> 

28420 28440 28460 

• « ♦ • • • 

GAC ATT AAG ATC AAT AAC TAT GCT CGC AAC GGC GAC ACC CTC CTC TTC TTC TTC TCG TAT GAG TGT 
DIKI NMYARNCDTLLFFFSYEO 

26480 28500 28520 

* * ♦ ♦ • • *■ 

TTT GTT GGC GAC AAG ATG ATC CTC AAG ATG GAT GGC GGC TGC GCT GGC TTC TTC ACT CAT GAA GAG 
FVG DK MlLKMDGCCAGFrTDEE> 

28540 28S60 28580 2B600 

• • • * ♦ . ♦ 

CTT GCC GAC GGT AAA GGC GTG ATT CGC ACA GAA GAA GAG ATT AAA GCT CGC AGC CTA GTG CAA AAG 
LA0GKGVIRTEEE1KARSLVQK> 

28620 28640 28660 

* * * ■» • * 

CAA CGC TTT AAT CCG TTA CTA GAT TGT CCT AAA ACC CAA TTT ACT TAT GGT GAT ATT CAT AAG CTA 
QRFN PLLDCPKTQFSYGDIHKL> 

28680 28700 28720 

* • * * • • * 

TTA ACT GCT GAT ATT GAG GGT TGT TTT GGC CCA AGC CAC ACT GGC GTC CAC CAG CCG TCA CTT TGT 
LTADIEGCFGPSHSGVHQPSLO 

28740 28760 28780 

« « * * * ♦ 

TTC GCA TCT GAA AAA TTC TTG ATG ATT GAA CAA GTC AGC AAG GTT GAT CGC ACT GGC GGT ACT TCG 
PASEKFLMIE0VSKVDRTGOTW> 

28800 28820 28840 26860 

• * ♦ * ♦ • • 

GGA CTT GGC TTA ATT GAG GGT CAT AAG CAG CTT GAA GCA GAC CAC TGG TAC TTC CCA TGT CAT TTC 
GLGLl EGHKQLEADHJ*Y FPC^H^F> 

— 4^ 

28880 28900 28920 

« • • • * • * 

AAC CGC GAC CAA GTG ATG GCT GCC TCG CTA ATC GCT GAA GGT TGT GGC CAG TTA TTG CAG TTC TAT 
KGDQVMAGSLMAEGCGQ tLOFY> 


28940 ^^^^^ 28960~ ' '"28980 k<j-M^4 ^ hi* W 

ATC CTG CAC CTT GGT ATG CAT ACC CAA ACT AAA AAT CGT CGT TTC CAA CCT CTT CAA AAC CCC TCA 
M LHLGMHT0TKNORFQPLENAS> 

29000 29020 29040 29060 

* * * « » • • 

CAC CAA GTA CGC TGT CGC GGT CAA GTG CTG CCA CAA TCA GGC GTG CTA ACT TAC CCT ATG GAA GTG 
QQVRCRCQVLPQSGVLTYRMEV> 

29060 29100 29120 

« » • • • • 

ACT GAA ATC GCT TTC ACT CCA CCC CCA TAT GCT AAA GCT AAC ATC GAT ATC TTG CTT AAT GCC AAA 
TEIGFSPRPYAKANID1LLNGK> 

29140 29160 29180 

• * * • • « • 

CCO GTA GTC GAT TTC CAA AAC CTA CCG OTG ATG ATA AAA GAC GAA GAT GAG TGT ACT CCT TAT CCA 
AVVDPQMLCVMIKEEDECTRYP> 

29200 29220 29240 29260 

* • • » • • ■* 

CTT TTG ACT GAA TCA ACA ACG GCT AGC ACT GCA CAA GTA AAC GCT CAA ACA AGT GCG AAA AAG GTA 
LLTESTTASTAQV»AQTSAKKV> 

29380 29300 29320 

« • « • ♦ • 

TAC AAG CCA GCA TCA C5TC AAT GCG CCA TTA ATG GCA CAA ^TT CCT CAT CTG ACT AAA GAG CCA AAC 
yKPASVMAPl*MAQIPDLTKEPN> 

29340 29360 29380 

* • • « » ♦ • 

AAG GCC GTT ATT CCO ATT TCC CAT GTT GAA GCA CCA ATT ACC CCA GAC TAC CCO AAC CCT CTA CCT 
KOVIPISHVEAPITP0YPNRVP> 

29400 29420 29440 



wo 98/55625 


28 / 106 


PCT/US98/lld39 


GAT ACA CTC CCA TTC AGO CCC TAT CAC ATC TTT GAG TTT CCT ACA GCC AAT ATC GAA AAC TGT TTC 
OTVPFTPYHMFCFATON !i:jJr;- B;?- " ; sfc 

39460 9480 29500 29i20 

» • ♦ • • • • 

OCO CCA QkQ TTC TCA ATC TAT CGC GGC ATG ATC CCA CCA CCT ACA CCA TCC GOT CAC TTA CAA GTG 
C PE F 5 I yRGMIPP*RTPCGDLOV> 

29540 29560 29SB0 

ACC ACA CCT CTG ATT CAA GTT W^C GGT AAG COT CCC CAC TTT AAA AAG CCA TCA TCG TOT ATC CCT 
TTRV IBVNCKRG0FKKPSSC1A> 

29600 29620 29640 

. • • . • • 

CAA TAT GAA CTG CCT CCA CAT GCG TCC TAT TTC GAT AAA AAC ACC CAC CGC OCX GTG ATG CCA TAT 
BYEVP ADAWYFDKMSKGAVMPY> 

29660 *2=^^^ 049V8V^ ^^V''^^ ^ 29700 ^ 29720 

TCA ATT TTA ATC CAC ATC TCA CTG CAA CCT AAC CCC TTT ATC TCA CCT TAC ATC GCC ACA ACC CTA 
ISLQPMCFISCYMGTTI.> 


29740 "T 29760 29780 

• • ♦ 

GCC TTC CCT GGC CTT CAC CTG TTC TTC CGT AAC TTA CAC GGT AGC GGT GAG TTA CTA CCT GAA CTA 
aFPGLELFPRNLDGSCELLREV> 

29800 29820 29840 

♦ • » • • • • 

GAT TTA CGT GGT AAA ACC ATC CGT AAC CAC TCA CGT TTA TTA TCA ACA CTC ATG CCC GGC ACT AAC 
DLRGKTIRMOSRLLSTVMAGTN> 

29860 29880 29900 29920 

• • , • . 

ATC ATC CAA AGC TTT AGC TTC GAG CTA AGC ACT GAC GGT GAC CCT TTC TAT CCC CGC ACT CCC CTA 
IIQSFSFELSTOGEPFYRCTAV> 

29940 29960 29980 

♦ 

TTT CGC TAT TTT AAA GGT GAC GCA CTT AAA GAT CAG CTA GGC CTA GAT AAC CCT AAA CTC ACT CAO 
FGYFKCDALKO0LCLDNCKVTQ> 

3OO00 30020 30040 

. • • . • ♦ • 

CCA TCC CAT GTA OCT AAC CGC GTT GCT CCA ACC ACT AAG GTG AAC CTC CTT GAT AAC ACC TCC CGT 
PWHVANGVAASTKVNLLDKSCR> 

30060 30080 30100 

* . ♦ • • • 

CAC TTT AAT GCG CCA GCT AAC CAG CCA CAC TAT CGT CTA GCC GCT GCT CAG CTC AAC TTT ATC CAC 
HFNA PANQPHYRLAGGQLMFI»> 

30120 30140 30160 30180 

• * • . * • • 

ACT GTT GAA ATT GTT GAT AAT GGC GGC ACC GAA GGT TTA GGT TAC TTC TAT GCC CAG CCC ACC ATT 
SVEIVDNGGTEGLGYLYAERTI> 

30200 30220 30240 

GAC CCA ACT GAT TCC TTC TTC CAG TTC CAC TTC CAC CAA GAT CCC CTT ATG CCA CCC TCC TTA CGT ^ <UcAC> l^'W^ 

DPSDWFFQFHFH0DP^J4^G,SLC> 

30260 30280 30300 

• . . . • • 

GTT CAA GCA ATT ATT GAA ACC ATG CAA GCT TAC GCT ATT ACT AAA GAC TTC CCC GCA CAT TTC AAA 
VEAI 1 ETMQAYAISKDLCADFK* 

30320 30340 30360 30380 

• ♦ • . ♦ . 

AAT CCT AAG TTT GGT CAG ATT TTA TCG AAC ATC AAG TCC AAC TAT CGC CGT CAA ATC AAT CCC CTC 
IIPKFCQILSNIKWKYRCQIMPL> 

3O4O0 30420 30440 

• • ♦ . • • 

AAC AAC CAC ATG TCT ATG GAT GTC ACC ATT ACT TCA ATC AAA GAT GAA GAC GGT AAG AAA GTC ATC 
jlKQHSMDVSrTSIKDEOCKKVI> 

30460 30480 30500 

, . « ♦ • • • 

ACA CCT AAT CCC ACC TTC ACT AAA GAT GGT CTG CGC ATA TAC GAG GTC TTC GAT ATA CCT ATC AOC 
TCNASI-SKOOLRIYEVFDIAIS> 


ATC CAA GAA TCT CTA T AAATCCGAGT GACTCTCTCG CTATTPTACT CAATTTCTCT GTCAAAA ^TO CTy CTATA j 
I £ E S V> L 
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30600 30620 30640 30660 

• * . . • • • 

TTCATAGCCT GCGCCCTTTT TTCTGCAAAT TCAGCAAAAG TATCTGCCTC CTAACTCCAT TTATAAQAAT CGTTTAATTO 

30680 30700 30720 30740 

AAAAGAACAA CAGCTAAGAG CCGCAAGCTC AATATAAATA ATTAAGGCTC TTACAAATA ATG AAT COT ACA GCA ACT 

♦ M » P T A T> 

30760 30780 30800 

0 * « • • • • 

AAC GAA ATG CTT TCT CCG TGG CCA TGC GCT GTG ACA GAG TCA AAT ATC AGT TTT GAG GTG CAA GTC 
MEKLSPWPWAVTESMISFDVQV> 

30820 30840 30860 

• * • ♦ • ♦ 

ATG GAA CAA CAA CTT AAA GAT TTT A6C CGG GCA TCT TAC GTG CTC AAT CAT GCC GAC CAC GGC TTT 
MEQQLKDPSRACYVVNHADHGP> 

30BB0 30900 30920 30940 

* « « • ♦ • ♦ 

GGT ATT GCG CAA ACT GCC GAT ATC GTG ACT GAA CAA GCG GCA AAC AGC ACA GAT TTA CCT GTT AGT 
G1AQTADIVTEQAANSTDLPVS> 

30960 30980 31000 

« * • • • • • 

GCT TTT ACT CCT GCA TTA GCT ACC GAA AGC CTA GGC GAC AAT AAT TTC CGC CGC GTT CAC GGC GTT 
AFTPALGTESLGDMNFRRVHGV> 

31020 31040 31060 

* ♦ • • • • 

AAA TAC GCT TAT TAC GCA GGC GCT ATG GCA AAC GGT ATT TCA TCT GAA GAG CTA GTG ATT GCC CTA 
KYAYYAGAMANGISSEELVIAL> 

31080 31100 31120 31140 

• ♦ ♦ * . » • 

GGT CAA GCT GGC ATT TTG TGT GGT TCG TTT GGA GCA GCC GGT CTT ATT CCA AGT CGC GTT GAA GCG 
GQAGILCGSFGAAGLIPSRVEA> 

31160 31180 31200 

• * * * * • 

GCA ATT AAC CGT ATT CAA OCA GCG CTG CCA AAT GGC CCT TAT ATG TTT AAC CTT ATC CAT AGT CCT 
A1NRIQAALPNGPYMFNLIHSP> 

31220 31240 31260 

• . * * ♦ * 

AGC GAG CCA GCA TTA GAG CGT GGC AGC GTA GAG CTA TTT TTA AAG CAT AAG GTA CGC ACC GTT GAA 
SEPALERGSVBLFLKHKVRTVE> 

31280 31300 31320 31340 

» • ♦ • • ♦ * 

GCA TCA GCT TTC TTA GGT CTA ACA CCA CAA ATC GTC TAT TAC CGT GCA GCA CCA TTG AGC CGA GAC 
ASAPLGLTPQIVYYRAAGLSRD> 

31360 31380 31400 

« • • * . • * 

GCA CAA CGT AAA GTT GTG GTT GGT AAC AAG GTT ATC GCT AAA GTA AGT CGC ACC GAA GTG GCT GAA 
AQGKVVVGNKV1AKVSRTEVAE> 

31420 31440 31460 

• * • ♦ « • 

AAG TTT ATG ATG CCA GCG CCC GCA AAA ATG CTA CAA AAA CTA GTT GAT GAC GGT TCA ATT ACC GCT 
KFMMPAPAKMLQRLVDDGS1TA> 

31480 31S00 31520 

* « « ♦ • • 

GAG CAA ATG GAG CTG GCG CAA CTT CTA CCT ATC GCT GAC GAC ATC ACT GCA GAG GCC GAT TCA OCT 
EQMELAQLVPMADD1TAEADSG> 

31540 31560 31S80 31600 

GGC CAT ACT GAT AAC CGT CCA TTA GTA ACA TTG CTG CCA ACC ATT TTA GCG CTG AAA GAA GAA ATT 
CKTDNRPLVTLI*PTIUALKEB1> 

31620 31640 31660 

* * 

CAA GCT AAA TAC CAA TAC CAC ACT CCT ATT CGT GTC GGT TGT GGT GGC GGT GTG GGT ACG CCT GAT 
QAKyOYDTPlRVGCGCCVGTPO> 

31680 31700 31720 

♦ • • • ^ ♦ * 

GCA GCG CTG GCA ACG TTT AAC ATG GGC GCG GCG TAT ATT GTT ACC GGC TCT ATC AAC CAA GCT TGT 
AALATFMMGAAYIVTGSINQAO 

31740 31760 31780 31800 

, ♦ * . * • ♦ 

GTT GAA GCG GGC GCA AGT GAT CAC ACT CGT AAA TTA CTT GCC ACC ACT GAA ATG GCC CAT GTG ACT 
VEAGASDHTRKLLATTEMADVT> 


%4 
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31820 31840 31860 

• • * . * • 

ATC GCA CCA OCT GCA GAT ATG TTC GAG ATG GCC GTA AAA CTG CAG GTG GTT AAG CGC GGC ACG CTA 
MAPAADMrBMGVRLQVVKRGTI*> 

31860 31900 31920 

• • * ♦ • ♦ * 

TTC CCA ATG CGC GCT AAC AAG CTA TAT GAG ATC TAC ACtt COT TAC GAT TCA ATC GAA GCG ATC CCA 
FPMRANKLVEIYTRYDSIEAIP> 

31940 31960 31980 32000 

• ft • • • ♦ • 

TTA OAC GAG COT GAA AAG CTT GAG AAA CAA GTA TTC CGC TCA AGC CTA GAT GAA ATA TGG GCA GGT 
LDEREKLEKQVFRSSLDEIWAO> 

32020 32040 320SO 

* « • • • • 

ACA GTG GCG CAC TTT AAC GAG CGC GAC CCT AAG CAA ATC CAA CGC GCA GAG GGT AAC CCT AAG CGT 
TVAHFNERDPKQIERAEGNPKR> 

32080 32100 32120 

* ♦ • • ♦ ♦ * 

AAA ATG GCA TTG ATT TTC CGT TGG TAC TTA GGT CTT TCT ACT CGC TGG TCA AAC TCA GGC GAA GTG 
KMAL1FRWYLQLSSRWSNS0EV> 

32140 32160 32180 

• * * * * * 

GGT CGT GAA ATG GAT TAT CAA ATT TGG GCT GGC CCT GCT CTC GGT GCA TTT AAC CAA TGG GCA AAA 
GREMDYQIWAGPALGAFNQWAK> 

32200 32220 32240 32260 

* • « , * ♦ * 

GGC AGT TAC TTA GAT AAC TAT CAA GAC CGA AAT GCC GTC GAT TTG CCA AAG CAC TTA ATG TAC GGC 
GSYLDNYQDRNAVDLAKHLMYO 

32280 32300 32320 

• * . . * ♦ • 

GCG GCT TAC TTA AAT CGT ATT AAC TCG CTA ACG GCT CAA GGC GTT AAA GTG CCA GCA CAG TTA CTT 
AAYLNR1NSLTAQCVKVPAQLL> 

32340 32360 32380* 32400 

* • « * » * ♦ 

CGC TGG AAG CCA AAC CAA AGA ATG GCC TA ATACACTTAC AAAGCACCAG TCTAAAAAGC CACTAATCTT 
RWKPNQRMA> 

32420 32440 32460 32480 

GATTAGTCGC TTTTTTTATT GTGOTCAATA TGAGGCTATT TAGCCTGTAA GCCTGAAAAT ATCAGCACTC TGACTTTACA 

32500 32520 32540 32560 

ACCAAATTAT AATTAAGGCA GGGCTCTACT CATTTATACT GCTAGCAAAC AAGCAAOTTO CCCAGTAAAA CAACAACCTA 

32SB0 32600 32620 32640 

CCTGATTTAT ATCGTCATAA AAGTTGGCTA GAGATTCGTT ATTGATCTTT ACTGATTACA GTCGCTCTGT TTGGAAAAAG 

32660 32680 32700 32720 

GTTTCTCGTT ATCATCAAAA TACACTCTCA AACCTTTAAT CAATTACAAC TTAGGCTTTC TGCGGGCATT TTTATCTTAT 

32740 32760 32780 32800 

TTGCCACAGC TGTATTTGCC TTTAGGTTTT CGQTGCAACT ACCATTAATT GAGQCCTCAT TAGTTAAATT ATCTGACCAA 

32820 32840 32660 

* 4 « , • * • 

GAGCTCACCT CTTTAAATTA CGCTTTTCAG CAA ATG AGA AAG CCA CTA CAA ACC ATT AAT TAC GAC TAT GCG 

MRKPLOTINYDYA> 

32880 32900 32920 

• * • * • « 

GTG TGG GAC AGA ACC TAC AGC TAT ATG AAA TCA AAC TCA GCG ACC GCT AAA AGG TAC TAT CAA AAA 
VWDRTYSYMKSNSASAKRYYEK> 

32940 32960 32980 33000 

• • * * • • • 

CAT GAG TAC CCA GAT GAT ACG TTC AAG AGT TTA AAA GTC GAC CGA GTA TTT ATA TTC AAC CCT ACA 
HBYPDDTrKSLKVDGVFIFNRT> 


33020 33040 • 33060 

• •♦•♦•* 

AAT CAG CCA GTT TTT AGT AAA GGT TTT AAT CAT AGA AAT CAT ATA CCG CTC GTC TTT OAA TTA ACT 
NQPVFSKCFNHRMOtPLVFE LT> 

33080 33100 33120 

* * . « • 

GAC TTT AAA CAA CAT CCA CAA AAC ATC GCA TTA TCT CCA CAA ACC AAA CAG GCA CAC CCA CCO GCA 
DFKQHPQN1ALSPQTKQAHPPA> 
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33140 33160 ^ "180 ^ 33200 

ACT AAC CCC TTA GAC TCC CCT GAT GAT GTG CCT TCT ACC CAT GGC GTT ATC GCC ACA CGA TAC GOT 
SKPLOSPDDVPSTHGVIATRYO 

33220 33240 33260 

• • • * 

CCA OCA ATT TAT AGC TCT ACC AGC ATT TTA AAA TCT GAT CGT AGC GGC TCC CAA CTT GGT TAT TTA 
PA1YSSTSII.KSDRSGSQLQYL> 

332B0 33300 33320 

* w • • • • 

GTC TTC ATT AGG TTA ATT GAT GAA TCG TTC ATC GCT GAG CTA TCG CAA TAC ACT GCC GCA GGT GTT 
VFI RL1DEWFIABLSQYTAAGV> 

33340 33360 33380 33400 

GAA ATC GCT ATG GCT CAT GCC GCA GAC GCA CAA TTA GCC AGA TTA GGC GCA AAC ACT AAQ CTT AAT 
EIAMADAADAQLARLOAflTKLN> 

33420 33440 33460 

• 4 * • ♦ • 

AAA GTA ACC GCT ACA TCC GAA CGG TTA ATA ACT AAT GTC GAT GGT AAG CCT CTC TTG AAG TTA GTG 
KVTATSERLITNVDGKPLLKLV> 

334B0 33500 33520 

* • . . * 

CTT TAC CAT ACC AAT AAC CAA CCG CCG CCG ATG CTA GAT TAC ACT ATA ATA ATT CTA TTA GTT GAG 
LYHTNN0PPPMLDYS11ILI.VE> 

33540 33560 33580 

* • • ♦ • • 

ATG TCA TTT TTA CTG ATC CTC GCT TAT TTC CTT TAC TCC TAC TTC TTA GTC AGG CCA GTT AGA AAG 
MSFLLILAYFLYSYPLVRPVRK> 

33600 33620 33640 33660 

• • ♦ . ♦ • * 

CTG GCT TCA GAT ATT AAA AAA ATG GAT AAA AGT CGT GAA ATT AAA AAG CTA AGG TAT CAC TAC CCT 
LA SDIKKMDKSRBIKKLRYHYP> 

33680 33700 33720 

ArP ACT GAG CTA GTC AAA GTT CCG ACT CAC TTC AAC GCC CTA ATG GOG ACG ATT CAC GAA CAA ACT 
IT E LV KVATHFMAL MQT I QEQT> 

33740 33760 33780 

* * ♦ • * • 

AAA CAG CTT AAT GAA CAA GTT TTT ATT GAT AAA TTA ACC AAT ATT CCC AAT CGT CGC GCT TTT GAC 
KQLNEQVFIDKLTNIPNRRAFE> 

33800 33820 33840 33B60 

* . • ♦ * * • 

CAG CGA CTT GAA ACC TAT TGC CAA CTG CTA GCC CGG CAA CAA ATT GGC TTT ACT CTC ATC ATT CCC 
QRLETYCQLLAR0QIGFTLI1A> 

33880 33900 33920 

• * * • • * 

GAT GTG GAT CAT TTT AAA GAG TAC AAC GAT ACT CTT GGG CAC CTT GCT GGG GAT GAA GCA TTA ATA 
DVDHFKEYNDTLGHLAGOEALI> 

33940 33960 33980 

• « . * • • • 

AAA GTG GCA CAA ACA CTA TCG CAA CAG TTT TAC CGT GCA GAA GAT ATT TGT GCC CGT TTT GGT GGT 
KVAQTLSQQFYRAEDICARFGG> 

34000 34020 34040 34060 

• • ♦ * • • 

GAA OAA TTT ATT ATG TTA TTT CGA GAC ATA CCT GAT GAG CCC TTG CAG AGA AAG CTC GAT GCG ATG 
EEF1MLFRDIPDEPLQRKLDAM> 

34080 34100 34120 

* • ♦ . . • 

CTC CAC TCT TTT GCA GAG CTC AAC CTA CCT CAT CCA AAC TCA TCA ACC GCT AAT TAC GTT ACT GTG 
LHSPAELNLPHPNSSTANYVTV> 

34140 34160 34180 

• • * • * 

AGC CTT GGG GTT TGC ACA GTT CTT GCT GTT GAT GAT TTT GAA TTT AAA ACT GAG TCG CAT ATT ATT 
SLGVCTVVAVDDPEFKSBSHII> 

« 

34200 34220 34240 

* • ♦ • * 

GGC AGT CAG GCT GCA TTA ATC GCA GAT AAG GCG CTT TAT CAT GCT AAA GCC TGT GGT CGT AAC CAG 
GSQAALIADKALYHAKACGRNQ> 

34260 34280 34300 34320 

* * * • • * 

TTG TCA AAA ACT ACT ATT ACT GTT GAT GAG ATT GAG CAA TTA GAA GCA AAT AAA ATC OCT CAT CAA 
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LSKTTITVDEIEOtiEAMKIGHQ> 

34340 34360 343B0 34400 

GCC TAA XCTCGTTCCA GTACTTTCCC CTAAGTCAGA GCTATTTGCC ACTTCAAGAT GTGGCTACAA GGCTTACTCT 
A> 

34420 34440 ♦ 34460 34460 

TTCAAAACCT GCATCAATAO AACACAOCAA AATACAATAA TTTAAGTCAA TTTAGCCTAT TAAACAGAGT TAATGACAGC 

34500 34520 34540 34560 

TCATGGTCGC AACTTATTAG CTATTTCTAG CAATATAAAA ACTTATCCAT TAGTAGTAAC CAATAAAAAA ACTAATATAT 

34580 34600 34620 34640 

AAAACTATTT AATCATTATT TTACAGAT6A TTAGCTACCA CCCACCTTAA GCTGGCTATA TTCGCACTAG TAAAAATAAA 

34660 34660 347D0 34720 

CATTAOATCG GGTTCAGATC AATTTACCAC TCTCCTATAA AATOTACAAT AATTCACTTA ATTTAATACT CCATATTTTT 

34740 34760 34780 34800 

* * ♦ • ♦ • » * 

ACAAGTAGAG AGCGGTGATG AAACAAAATA CGAAAGGCTT TACATTAATT GAATTAGTCA TCGTGATTAT TATTCTCGGT 

34820 34840 34860 34880 

* • * ♦ » • • • 

ATACTTGCTG CTCTGGCACT GCCGAAATTC ATCAATGTTC AAGATGACGC TAGGATCTCT CCCATCAGCG GTCAGTTTTC 

34900 34920 34940 34960 

ATCATTTGAA AGTCCCGTAA AACTATACCA TAGCCGTTGG TTAGCCAAAG GCTACAACAC TGCGGTTGAA AAGCTCTCAG 

34980 35000 35020 35040 

GCTTTGGCCA AGGTAATGTT GCATCAAGTG ACACAGGTTT TCCGTACTCA ACATCAGGCA CGAGTACTGA TGTGCATAAA 

35060 35080 35100 35120 

OCTTGTGGTG AACTATGGCA TGGCATTACC GATACAGACT TCACAATTGG TGCGGTTACT GATGGCGATC TAATGACTOC 

35140 35160 35180 35200 

AGATGTCGAT ATTGCTTACA CCTATCGTCG TGATATGTCT ATCTATCGCG ATCTCTATTT TATTCACCGC TCATTACCTA 

35220 35240 35260 35280 

CTAAGGTGAT GAACTACAAA TTTAAAACTG GTGAAATAGA AATTATTGAT GCTTTCTACA ACCCTGACGG CTCAACTGGT 

35300 35320 35340 35360 

CAATTACCAT AAATTTGGCG CTTATCTAAG TTGTACTTCC TCTGACCGAC ACAAATAATC TCGTTTCTCA GCATATATCA 

35380 35400 35420 35440 

AAATACACAG CAAAAATTTG GGGTTAGCTA TATAGCTAAC CCCAAATCAT ATCTAACTTT ACACTGCATC TAATTCCAAA 

35460 35480 35500 35520 

* * • * • * * 
CAGTATCCAG CCAAAAGCCT AAACTATTCT TGACTCA6CG CTAAAATATG CGATGCAACA AACAAGTCTT GGATCGCAAT 

35540 35560 35580 35600 

• *««••** 
ACCTGAGCTA TCAAAAATGG TCACCTCATC AOCACTTTGA CGTCCTCTTG CGGACTCCTT TATCACCTGA CCAATCTCAA 

35620 35640 35660 35680 

TTATCGGCGT ATTTCTGCTA TGTTGAAACT CACCAATAAC AATAGATTGA GAAGCAAAGT CGCAAAACAA GCGAGCATGA 

35700 35720 35740 35760 

CTATATAGOT CACTTGCCAA CTCTTGCTTA CCCACTTTAT CAGOGCCCAT TGCAGAAATA TGCGTTCCTG CTTGTACCCA 

357B0 35800 35820 35840 

CTGCGCrrCA AATAAACGCG CTTGAGCTGT GGTTGCTGTG ATAATAATAT CTGCTTGTTC ACAAGCAGCT TGTGCATCAC 

35860 35880 35900 35920 

AAGCTTCGGC ATTAATGCCT TTTTCTAATA AACGCTTAAC CAAGTTTTCA GTTTTGCTAG CACTACCGCC AACTACCAAT ^ 

35940 3S960 ^ 35980 ^ 36000 ^ 

ACCTTAGTTA ATGAACGAAC CTTGCTCACT GCTAGCACTT CATATTCAGC CTGATCACCG CTACCAAAAA CAGTTAATAC J 

36020 36040 36060 36080 
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CGTAGCATCT TCTCTCCCGA GCTAACTCAC TGCTACTCCA TCGGCACCAC CAGTGCGGTA AGCATTAACG GTAOTGGCAO 
36100 36120 36140 36160 

CAATCACCCJN CTGCAACATA CCCCTTAATC GATCGAOTAA AAATAOGTTA CTGCCCTGGC ATCCTAAACC ATCTTTATGG 
361B0 36200 36220 36240 

TTATCAGGCC AATAGCTGCC TGTTTTCCAG CCGACAAGGT TTGGCGTTGA AGCCGACTTT AATGAGAACA TTTCATTAAO 

36260 36280 36300 36320 

* * 

GTTCCCGCCC TGTGCATTAA CTACCGGGAA CAAGGTTGCT TTATCATCTA CGGCAGCGAC AAACGCTTCT TTAACAGCGA 
36340 36360 36380 36400 

TATAAGCCAG CTCATGCGAC ATGAGCTTTG ATGTTTCCGC TTCAGTTAAA TAGATCATAT TACCACCCCT GCACTCGATT 
36420 36440 36460 36480 

CCAGATCTCA TAGCCACCAT TATCACCATC AGTATCAAAT ACATOGTACT GAGCGTGCAT TOAAGCTGTT GCACAGGCGT 

36500 36520 36540 36560 

• • * * 

GCrPCGGCAA AATATOTAGA CGACTACCTA CCGGGAACTG CGCTAAMCA ATAACGCCCC CATCAACTGC TTCAATAATG 

36580 36600 36620 36640 

* . * • • * • * 

CCCTGCTCTT OATTAACAGT TATAACCTGT AGACCTGATA ACACGTGACC GCTGTCGTCA CACACTAAAC CATAACCACA 

36660 36680 36700 36720 

ATCTTTTCGC TGCTCTGCAG TACCTCTATC ACCCGAAAGA GCCATCCAAC CCGCATCAAT GAAAATCCAG TTTTTATCAG 

36740 36760 36780 36800 

GATTATGACC AATAACACTG GTCACTACCG TTGCGGCAAT ATCAGTTAAC TGACACACGT TTAGCCCTGC CATGACTAAA 

36820 36840 36860 36880 

TCCAAGAAGG TGTACACACC CGCTCPAACC TCGOTGATCC CATCAAGGTT TTGATAGCTT TGCGCTCTTC GTGTTGAACC 

36900 36920 36940 36960 

AATACTAACG ATGTCACATT GCATACCCGC TGCGCGAATG CGTCAGCAGC TTGTACAGCC GCTGCAACTT CATTTTGCGC 

36980 370O0 37020 37040 

CGCATCAATT AATTGCTGTT TTTCAAAACA TTGATATGAC TCACCAGCGT GACTHAGTAC CCCGTCAAAA CTCGCTGCGC 

37060 37080 37100 37120 

CAGACGTTAG TATCTGAGCA ATTTCAATCA ACTTATCGGC TTCCGGTGGA ATACCACCAC GATGGCCATC ACAATCAATT 

37140 37160 37180 37200 

TCAATTAATG CTGGTATTTG GCAGTCATAA GAACCACACA AATGATTTAG CTGATCCGCT TGCTCAACAC TATCAAGTAA 

37220 37240 37260 37280 

AACTCTTGCA TTAATACCTT GGTCCAACAT TTTAGCAATA CGCGGCAACT TACCATCGGC AATACCTACT GCATAAATAA 

37300 37320 37340 37360 

TGTCTCTGTA ACCTTTAGAT GCTAAGGCCT CGGCCTCTTT TACCGTTCAT ACAGTGACTG GTGAGTTTTT AGTGGCTAAT 

37380 37400 37420 37440 

AAAAACTCGC CTGCTTCAAG TOATCTTAAC GTTTTAAAAT GCGGTCTTAG GTTTCCACCT AATCCTTCAA TTTTTTGGCG 

37460 37480 37500 37520 

TAC5TTGACTC AGGTTATTAA TAAATACTGG CTTATTTACA TATAAAAACG CTGTATCAAT TGCTTOATAC TOACrTTGCT 

37540 37560 37580 37600 

GAGTCGTGGA AAGTATTTGA OTAGATGGCA TCTTTAATAT CCTAGTTCAT CAATCAATCT AACAACTTTG ATGCCTAGCC 

37620 37640 37660 37680 

• ••♦•*• * 

ACAGTGCCTT GTATTCATGA TGCTTTCGAA AATGCTTATA TTCAAAC1AT TTCAAAGACA TCAAACTTCT TCTTTAATGC 

37700 37720 37740 37760 

TCAGTATCCA CCAGCACGCA TTTATTTTAT ATTAACTATT ATCAAGATAT AGATTAGGTT CAAACCAAAT GATTAOTACT 


fig. 


37780 37800 ^ 37820 ^ 37840 1-] 

GAAOATCTAC GTTTTATCAC CGTAATCGCC AGTCATCGCA CCTTAOCTGA TGCCGCTAGA ACACTAAATA TCACOCCACC / 
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37860 37880 
• • • • ♦ 

ATCAGTGACA TTAACGTTCC AGCATATTGA AAAGAAACTA TCGATTAGCC TGATC 
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10 


20 


30 


40 


50 


60 


AATAGATCGACTCGCAAAAGTTGCTTAAGATAGTGTCAATATAGCTTCTTATTTGTAAAT 

70 80 90 * 100 110 120 

ATTGTTTTTTATGTGTAAACATGTTTAGTGTGTGTAAATGCTGTTAATTATCCTTTTGGG 

130 140 150 160 170 180 

ATTGTAATAGCTGATGTTGCTGGCTAATGAGTACTTTTAGTTCGGCAATATCTTGCTTTA 

190 200 210 220 230 240 

AATCGCTAACTTCAGTTTTTAATTCACCCACACTTGTTGTATTTTTAAGGCTCTCTTCCC 

250 260 270 280 290 300 

CACCATCGACAAACCAGGATGATATGAAACCGGTAAACGTACCAAAGAGACCGACACCTG 

310 320 330 340 350 360 

CAGTCATGAGTAATGCCGCAATGATACGTCCGCCAGTGGTGACGGGGTAGTAGTCACCGT 

370 380 390 400 410 420 

AACCAACAGTCGTTATTGTCACAAATGACCACCAAAGTGCGTCGATGCCGTTATTGATGT 

430 440 450 460 470 480 

TACTGCCTACTTGATCCTGTTCTAACAATAAAATACCGATAGCACCAAAGGTGACAAGGA 

490 500 510 520 530 540 

TGAAGGATATCGCAGATACCAGCGAAAAGGTGGCTTTAAACCGATGTTCAAAAATCATTT 

550 560 570 580 590 600 

TTAAGATAATTTTTGATGAGCGTATATTCTGAATAGATCTTAATACTCTAGCGATACGAA 

610 620 630 640 650 660 

TTATGCGAATAAACTGCAGTTGCTCGACCATCGGAATACTCGACAGTAGGTCAATCCAAC 

670 680 690 700 710 720 

CCCATTTCATAAACTGAAATTTATTCTCAGCTTGGTGAAAGCGAATTACAAAGTCAGTGA 

730 740 750 760 770 780 

AAAAGAATAAGCAAATCGTATTATCTACGCTCGTTAATATTTCAGTGACGTTACTTGAAA 

790 800 810 820 830 840 

AGGTAAAAATAAGTTGCAGTAGTGATGATACGACCACATGAAGTGATAAAATAAGCATGA 

850 860 870 880 890 900 

AAATCTGAAATGGATTTACATCACTGTTGTTTTTGGTGCCACTTTTAAGGTTCGTTTTCA 

910 920 930 940 950 960 

CAATCTGCTGCCTCGGTTCATTGATTTTGTTAATATAAACCTTAGTCAGTAGCAAGACAA 

970 980 990 1000 1010 1020 

AATATATTTACATCAATGTCATCGTATTATTCAACCGCGCGTCGTGTATTCAGACCAAGA 

1030 1040 1050 1060 1070 1080 

TCGTTGTATATGTTAGTCATGTAGCGATGAGATTATCATGCGACAGGAGAGAATTATGTT 

1090 1100 1110 1120 1130 1140 

TGTTATTATTTTTTACGTACCTAAAGTTAATOTTGAAGAAGTAAAACAGGCGTTATTTAA 



wo 98/55625 


36 / 106 


PCT/US98/11639 


1150 1160 1170 1180 1190 1200 

CGTCGGAGCTGGCACCATCGGTGATTATGATAGTTGTGCTTGGCAATGTTTGGGGACTGG 

1210 1220 1230 * 1240 1250 1260 

GCAGTTCCAACCTTTACTTGGTAGCCAGCCACATATTGGTAAGCTAAATGAGGTTGAATT 

1270 1280 1290 1300 1310 1320 

CGTTGATGAGTTTAGAGTAGAAATGGTTTGTCGAGCAGAAAATGTAAGGGCAGCAATAAA 

1330 1340 1350 1360 1370 1380 

TGCACTTATTGCTGCGCACCCTTATGAAGAACCTGCTTATCATATTCTGCAAACATTGAA 

1390 1400 1410 1420 1430 1440 

TCTTGATGAGTTACCTTAAGTTAGATGCACTGCACTTAATTGGTTCGCTGTGCTAGGTTA 

1450 1460 1470 1480 1490 1500 

GCAATTAGCAATTTTGACCATGTTAGCGATAGTTTTGGCACAAGTGATCGATATTAAACT 

1510 1520 1530 1540 1550 1560 

ATCCGATTCAGATCCCATTTTTACTGCTGAATTAGGTTTCATTACACTTGTTCTAGTGGT 

1570. 1580 1590 1600 1610 1620 

TTTTCCCGACAGGTGTAACTCTGTTACTTGCGTAAGGTTGATAATCTCTACCGCATTGGC 

1630 1640 1650 1660 1670 1680 

AGGAGTTACACCTGCACCAGGCATAATACTAATTCTACCATCTGCTTGGTTAACTAACGT 

1690 1700 1710 1720 1730 1740 

TTGGATTAAGGCGCAGCCTTCTAGCGCTTGAGCTTGTTGACCAGAGGTTAAAATACGCTC 

1750 1760 1770 1760 1790 1800 

ACAACCAGCAGTGATCAAGGTCTCCAAGGCTTGTTGTGGATCATTACACAAGTCGAAAGC 

1810 1820 1830 1840 1850 1860 

GCGGTGGAAGGTTACGCCGAGATCACGTGATGCCACCATTAAGCGTTTTAAAGCTGGCTC 

1870 1880 1890 1900 1910 1920 

GTCAATATTACCATCTGCTGTTAACGCGCCAATAACGACCCCTTGGACACCGAGTAACTT 

1930 1940 1950 1960 1970 1980 

CATGAATTTGATGTCGGAAACCATAATATCAACTTCTTGTTCGCTATATACAAAATCACC 

1990 2000 2010 2020 2030 2040 

GGCGCGAGGGCGAATAATGGCATAAATGGGGATCGTTGCTAGATCAATAGACTTTTGTAC 

2050 2060 2070 2080 2090 2100 

AAAACCTGCGTTGGCGGTCAAGCCACCTAATGCTAATGCCGAGCACAACTCAATACGATC 

2110 2120 2130 2140 2150 2160 

GGCGCCAGATGCTTGAGCCGTCAGCAGTGATTCTATATTATCGACACATACTTCTATTGT 

2170 2180 2190 2200 2210 2220 

CATTGTCATATACTTCTCTTTAAAAAGTTTATTAAAAATAATAAAGCCAGCATAAGTCGT 

2230 2240 2250. 2260 2270 2280 

TTTATACAATATGAAAGGGGAAAAGGCGACTTAGCTCGCCTAGATCAATTATTATGGCAG 

pig. 5 
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2290 2300 2310 2320 2330 2340 

AATACTGCCGTATTGTGATTAGAAAGACAGTTTTTTAAGCTCAATAGCCGTTATCGCGTT 

2350 2360 23-70 * 2380 2390 2400 

GTTATCTACCATCGTGTAACTTTTCTGGCCTGGGTGCTTTATTAACACTGTTTCAGTGGC 

2410 2420 2430 2440 2450 2460 

TGGATTAGGGTGAAATGATTCTTTTTTCAAATCTGTTTTTTTGTATTTGAACGTACCTGT 

2470 2480 2490 2500 2510 2520 

AATGTCTTGCTGCTCACGAAGACGTACAAATATTGGTTGCGCATAGCTTGGTAGTGCCGC 

2530 2540 2550 2560 2570 2580 

ATTGACATGTTGATAGAATTCAGACGCTGAAAATTCATGAATAGGGCAATTCAAAGTCAG 

2590 2600 2610 2620 2630 2640 

CGCGACCATGCCTGCTCGGCCATCGTGATGTGGGAGCTTGACACCATAAGCCACACTTTG 

2650 2660 2670 2680 2690 2700 

CTCAATTTGCACAAAATCGTTAACTTGAGCTTCTACTTGCGTCGTGGCGACATTTTCACC 

2710 2720 2730 2740 2750 2760 

TTTCCAGCGGAATGTATCACCTAATCTATCCACAAAGGAAATATGGCGATAACCTTGGTA 

2770 27B0 2790 2800 2810 2820 

ATGAACGAGATCGCCGGTATTAAAATAACAGTCACCGTCTTTTAATACTGACTTAAATAG 

2830 2840 2850 2860 2870 2880 

CTTTTTATTACTTTCGTTGTCATCGGTATAACCATCAAATGGTGAACGTTTAGTTATCTT 

2890 2900 2910 2920 . 2930 2940 

TGTTAGCAGTAGCCCTGTTTCTCCCGTTTTTACTTTGGTCATTTTCCCTTTCGCATTATA 

2950 2960 2970 2980 2990 3000 

CACAGGTTTGTCATTGTCAATATCATATTGTATGACGGTAAAAGCAAGTGGAGTAACCCC 

3010 3020 3030 3040 3050 3060 

CGCTGTATGCGGTAAGTTCAGCGCATTGGAGAACACAAGATTACACTCACTGGCGCCATA 

3070 3080 3090 3100 3110 3120 

GAATTCATTAATATGCTCGATCCCAAAACGTTGTTGGAAATGATCCCAAATTTCGGGGCG 

3130 3140 3150 3160 3170 3180 

TAATCCATTACCTATGATTTTCTTTATATTATGCTGTTTGTCTTTATTGCTAGGCGGTAC 

3190 3200 3210 3220 3230 3240 

ATTTAATAAATAACGGCAGAGCTCGCCGATGTAAGTAAACGCAGTGGCATTATGAGCACG 

3250 3260 3270 3280 3290 3300 

AACTTCATCCCAAAAGCGACTTGAACTGAATTTTTCAGAAAGTGCGAGGGTTGCTGCGCT 

3310 3320 3330 3340 3350 3360 

ACCAAACACGGCGCTTAATGACACTGTCAGTGCATTGTTATGGTATAGGGGGAGTGATAA 

3370 3380 3390 3400 3410 3420 

ATACAATACATCATCAGCTGTTAAGCGTAATOATGCCATCCCCATGCCTGCCATGGATTT 

P'3 ■ ^ 
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3430 3440 3450 3460 3470 3480 

AAACCAACGGTGATGGCTCATTCTTGCTGCTTTTGGCAGTCCAGTTTTTCCCGAGGTAAA 

3490 3500 3510* 3520 3530 3540 

GATATAAAACGCGCAATGCTTAAGCTGTATTTGTGCTGTTGATTCAGGGTTCAATACTGA 

3550 3560 3570 3580 3590 3600 

ATATCCTGCGACTAGTGTAGATATGTTTTTATAACCATCACTCATGTCTGGCGTTTCTAA 

3610 3620 3630 3640 3650 3660 

AGCGGGTACGTAAAAGACATTCTGTTGTAATGTCGATGACAAATTGGTTTCAATATTATT 

3670 3680 3690 3700 3710 3720 

AATGGCGGATGTGTATAGTTCATCrrGCGATGAGTAATTTGGTATCGACCACGCTAAGACT 

3730 3740 3750 3760 3770 3780 

ATGTTCGAGGATTGAATCCCGTTGTGTCGTATTTATCATACAAGCAATCGCGCCAAGCTT 

3790 3800 3810 3820 3830 3840 

GACAACTGCGAGGGCAATAATGATGGTTTCAGGCCTGTTATCGAGCATGATGGCGACTTT 

3850 3860 3870 3880 3890 3900 

ATCATTTTTACCAATGCCGTATTCATGAAGGAAATGGGCATATTGATTTGCTTGCTTATT 

3910 3920 3930 3940 3950 3960 

CAATGAATCGTAACTATAACGCTGGTCTTTAAATTGTATTGCGATCAAGTCAGAGTTATT 

3970 3980 3990 4000 4010 4020 

GACAGCTTGCTGCTCTAGTAATAAACCAATAGACATAAAACGTTCGGGCTTTGCTTGTTG 

4030 4040 4050 4060 4070 4080 

TAAGTGCCATAAGCCTTTGATGATTGGCTTTGGGGTTTTTAATAGATTGATGGTACTTTT 

4090 4100 4110 4120 4130 4140 

CAGGAATTGTTTGCCGGTTATAACAGTCATAAGCTAATTCTTTTTATCAAGAAGAGGGGT 

4150 4160 4170 4180 4190 4200 

TATGACACCAAATAAATGGGTCACGCGTTGGTTTAATTTGGTTAGACTAAATGTGTTGTT 

4210 4220 4230 4240 4250 4260 

TTGCTGTGATAATGCGACGTTCAAACAAACTTGAGAAGGTAAAAAAATAGCATTTTTAAA 

4270 4280 4290 4300 4310 4320 

TTGAACATCAATACTAATGTGTTGAATATCAATCAAGTTTTCTAACTGTGCGAGCACGCG 

4330 4340 4350 4360 4370 4380 

TGCTTTAGCAAACATGCCATGTGCTATTGCTGTTTTAAACCCCATTAGTTTCGCTGGGAT 

4390 4400 4410 4420 4430 4440 

AAAATGTAAATGGATTGGATTTGTGTCTTTGGAGATATAAGCATATTTATATACGTCAAA 

4450 4460 4470 4480 4490 4500 

AGGACTAAATTTAAACAATGAAATCGGCTCGTAAGCATAATTCGCTGGCGTATTTACTAT 

4510 4520 4530 4540 4550 4560 

TTTCTCACCGCTGGAACGTTGAGATCGTTGGCACGTTTTTCGCTGTTTCGTTTTCTGTAA 

fig. 5 
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4570 4580 4590 4600 4610 4620 

GAATGTCGATGTACACTCCCACGCAAATTGTCCATCTACAAACACATCAATATGAGTATC 

4630 4640 4650* 4660 4670 4680 

AATGAAACGTCCTGTATCCGTTATGTACTCCTTAATTACACGACATGTGCTCGTCAATAT 

4690 4700 4710 4720 4730 4740 

CGCGTTTAATGCTATCGGTTGATGTTGTGTTATGCGATTTCGATAATGGACTAGTCCTAA 

4750 4760 4770 4780 4790 4800 

TATAGATATCGGAAATTGTGTTGATGTCATGAGTTTCATCAATAATGGAAAGATCATCAC 

4810 4820 4830 4840 4850 4860 

AAATGGATAAGTAACCGGTACATAGTTTGTGTTATTAAACCCACAGCATTTAATATATTG 

4870 4880 4890 4900 4910 4920 

CTTTAAATTTCGCTGATCTATTTTTTGTCCACTGATACTAAATTGCTCAGTACACACTTG 

4930 4940 4950 4960 4970 4980 

TGTCGACCAAGTGTTCATCAGTGTTTTAACAATTGTATTGACCACTGCTTTCACATATAA 

4990 5000 5010 5020 5030 5040 

AAGCGAGATAATCGGTTGCrrTGTTAACAGTGTGATCTGGTTAGCGTGCATTGAAATAAT 

5050 5060 5070 5080 5090 5100 

TCATATAAGAGTATGTAGCATTTATGTTAATATTTTGTTTTGGAAGTTGAATTGGCGAAT 

5110 5120 5130 5140 5150 5160 

CCGTAATCGGTTTATGGCAGTTCGGTCAAATACTTCAGGTAAACTCGTTACTCATACCAT 

5170 5180 5190 5200 5210 5220 

TGATAGTGTTAAAGTGATTGACTGAATAAAGAATAGAGCTAAAAGTGGAAAAATTATGCA 

5230 5240 5250 5260 5270 5280 

AGATGCGGGTATGTTATTACGCATTGCTTATGAGGCAATGAAAGAGTTAGAGGTTGATGT 

5290 5300 5310 5320 5330 5340 

CATTGAAGTACTTTCTCGTTGTAACATAAGTGAAGAAGTACTGAATGATAAGGATCTTCG 

5350 5360 5370 5380 5390 5400 

CACACCTAATCATGCACAAACACATTTTTGGCAAGTATTAGAAGACATATCACAAGATCC 

5410 5420 5430 5440 5450 5460 

TAACATCGGCATTTCACTTGGTGAGAGAATGCCAGTGTTCACGGGGCAGGTATTACAGTA 

5470 5480 5490 5500 5510 5520 

TCTTTTTCTCAGTAGTCCTACATTTGGTACTGGCTGGGAACGCGCAACAAAATACTTTCG 

5530 5540 5550 5560 5570 5580 

. ATTAATCAGTGATGCGGCGAGTGTTTCTATCAAGATGGAAGGCTGTGAAGCGCGATTATC 

5590 5600 5610 5620 5630 5640 

TGTGAACTTAGATGGTTTAGCGGAAGATGCGAATCGTCATTTGAATGATTGCCTAGTGAT 

5650 5660 5670 5680 5690 5700 

CGGTGCATTTAAATTTTGTTTATATGTGACAQAAGGCGAATTTAAAGTAAGCAAAATAGC 
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5710 5720 5730 5740 5750 5760 

CTTTGCTCATGCTCGCCCGAAAGATATTACTGCCTATACCAATGTATTTACATGTCCGAT 

5770 5780 5790* 5800 5810 5820 

TGAGTTTGCTGCCGAAGATAATTATATTTATTTCGATGCTGATTTACTCGAACGTCCTTC 

5630 5840 5850 5860 5870 5880 

TTCGCATGCGGAGCCTGAGCTATTCGCCTTACACGATCAGCTTGCAAGCCGTAAAATAGC 

5890 5900 5910 5920 5930 5940 

CAAGTTAGAACTGCAAGATTTAGTGGATAAAGTACGTAAGGTTATTGCACAACAACTTGA 

5950 5960 5970 5980 5990 6000 

GTCTGGTGTGGTGACTTTAGAAAGTATCGCCAC1GAACTTGACATGAAACCACGTATGCT 

6010 6020 6030 6040 6050 6060 

AAGAGCGAAGTTAGCTGACATTGATTATAACTTTAATCAAATACTCGCTGATTTTCGTTG 

6070 6080 6090 6100 6110 6120 

CGAGTTATCAAAAAAACTGTTGGCGAATACGGACGAGTCTATTGATCAGATTGTCTATCT 

6130 6140 6150 6160 6170 6180 

CACTGGTTTTTCTGAACCAAGTACTTTTTATCGTGCCTTTAAGCGCTGGGTTAAAATGAC 

6190 6200 6210 6220 6230 6240 

GCCAATTGAATATCGCCGTAGCAAACTCGCGGTTAGGCATGCTAATCAACACGAGTCCTA 

6250 6260 6270 6280 6290 6300 

AAAATTCGCTGCTTAGTGCATAGTGCATAGTGCATAGTGCTAGTAAGCCAAGTACAAAGC 

6310 6320 6330 6340 6350 6360 

GTTAAAGTTAAGTACTTGAGCGAACCATCAGACACCACTTACTAGATTAAGCACCTATTA 

6370 6380 6390 6400 6410 6420 

ATGATTGACCACAAATTCTGATCGTATTGCCTGTGATCCCTGCAGCTTGAGGTTGCGCAA 

6430 6440 6450 6460 6470 6480 

AAAAAGCTATCGCTTCAGCAACATCAACTGGCTTACCACCTTGTTTTAATGAATTCATAC 

6490 6500 6510 6520 6530 6540 

GACGACCAGCTTCACGAACTGTAAATGGAATCGCTGCTGTCATTTTTGTTTCAATAAAGC 

6550 6560 6570 6580 6590 6600 

CTGGTGCAACAGCATTAATGGTGATGTATTTGTCTGCAAGCGGAGTTTGCATTGCATCAA 

6610 6620 6630 6640 6650 6660 

CATAACCAATGACTGCGGCCTTAGACGTTGCATAATTAGTCTGACCAAAGTTACCCGCAA 

6670 6680 6690 6700 6710 6720 

TCCCACTCATCGAAGACACACAAACAATGCGGCCATAGTCGTTGAGCAGATCATCATTTA 

6730 6740 6750 6760 6770 6780 

GCAGTCGCTCATTGATTCTTTCCATTGCCGACAAGTTAATATCCATCAGTACATCCCAAT 

6790 6800 6810 6820 6830 6840 

GGTTATCCGGCATACGTGCTAGCGTTTTGTCHTTTGTTACCCCGGCATTATGGACGATGA 
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6850 6860 6870 6880 6890 6900 

TATCAAGCGACTGTTCTCGCACAAAGTCAGCAATGATATTTGGGGCGTCAGCAGCGGTAA 

6910 6920 6930* 6940 6950 6960 

TATCAGCAACAATGCTGCTACCTTTCAAGCAATGAGCTACTTTTTCAAGGTCCTGTTTTA 

6970 6980 6990 7000 7010 7020 

ATGCCGGAATGTCTAAGCAAATAACATGTGCGCCATCACGGGCGAGTGTTTCAGCAATAG 

7030 . 7040 7050 7060. 7070 7080 

CAGCCCCGATGCCACGTGATGCACCAGTGACAAGT6CTGTCTTTCCTTGTAATGGTTTTG 

7090 7100 7110 7120 7130 7140 

CCGTGTTACTTGTTTCGTTAATAACTTCGTTAATAACTTCGTTAATAACTTCGTTAATAG 

7150 7160 7170 7180 7190 7200 

CCCCATTAATCGAACCGGGTTTTACGTTAATAACCTGTGCTGAGATATAGGCTGATTTTG 

7210 7220 7230 7240 7250 7260 

CTGAGGTTAAGAAACGTAGCGGGGCCTCTAATAATTGCTCACTACCAGGTTGTACATAGA 

7270 7280 7290 7300 7310 7320 

TAAGTTGACAGGTACTACCATTCTTGCCTATTTCTTTGGCGACACTGCGACAAAACCCTT 

7330 7340 7350 7360 7370 7380 

CTAAAGATCTTTGTACAGTCGCGTAGCTTACATCGTCAAGATGTTCACTCGGATGACCTA 

7390 7400 7410 7420 7430 7440 

ACACGATCACTCTGCTGCATGGCGAGAGCTGCTTAATTACAGGTTGAAAAAAACGATGTA 

7450 7460 7470 7480 7490 7500 

ATGCACTTAATTGCTTGCTGTTCTTAATGCCTGAGGCGTCGAAGATAATACCGTTGAAGC 

7510 7520 7530 7540 7550 7560 

GATCTGTTTTAGCGATAGCATTAAGGCTAATAGGTGTCGCGACTAAAGACGTTTGATTAA 

7570 7580 7590 7600 7610 7620 

ATTCAATATTAAGATCGGCTAACGCTGACGTGTTATTAGGATAAGAAATCGTGACTTCAG 

7630 7640 7650 7660 7670 7680 

CATCTTTAAATGTGTTAAGAATGGGTTTAATTAATTTGCTGTTGCTGGCTGCGCCGATGA 

7690 7700 7710 7720 7730 7740 

GTAAGTTGCCAGAGATGAGATCGGTTCCCTGATCGTAGCGTGTTAACGTAACCGGTCGTG 

7750 7760 7770 7780 7790 7800 

GCAGATTAAGCGCTTTAAATAAACCTGATGTCCACTTGCCATTAGCGAGTTTTGCGTATG 

7810 7820 7B30 7840 7850 7860 

TATCCGTCATTTTCTAATCCTTGTTATAGTGAACAGTTTGAATCTCGAAGATGTACATGT 

7870 7880 7890 7900 7910 7920 

GTTAAAAATTATCTGATAGCTATGACTTATCTGCCACTACGTAATAATAAATAGACCAGT 

7930 7940 7950 7960 7970 7980 

TCATTACATCGTTAATCGATATAGTATAACTAAATACTAAGTAAATTATAATGATAAGAC 
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7990 


8000 


8010 


8020 


8030 


8040 


TGTTATCGTACTCGGATCAAACTCTGATCAGCAAATAATCAAATTAGAGTTTTTATTTTA 

8050 8060 8070^ 8080 8090 8100 

AACTTGTATCAACAATGTTACATTAATGTATCTTACGTCTAATGTGCTACGGGCATATTT 

8110 8120 8130 8140 8150 8160 

AAGTCACTAAATTAAAGGAATAAACCATGACAGGTCAAACAATAAGAAGAGTAGCAATTA 

8170 8180 8190 8200 8210 8220 

TCGGCGGTAACCGTATCCCGTTTGCACGTTCAAATACAGCGTATTCAAAACTAAGTAACC 

8230 8240 8250 8260 8270 8280 

AAGATATGCTGACGGAAACTATCCGTGGCTTGGTGGTTAAATATAACCTACGTGGTGAAC 

8290 8300 8310 8320 8330 8340 

AACTGGGGGAAGTTGTTGCTGGTGCGGTAATTAAGCATTCTCGTGATTTTAACTTAACAC 

8350 8360 8370 8380 8390 8400 

GTGAAGCCGTGCTAAGTGCAGGTCTTGCACCTGAAACGCCTTGTTATGACATTCAACAAG 

8410 8420 8430 8440 8450 8460 

CTTGTGGTACTGGTCTAGCTGCAGCTATCCAAGTAGCAAACAAAATTGCGCTTGGTCAAA 

8470 8480 8490 8500 8510 8520 

TAGAAGCGGGTATTGCTGGTGGTTCTGATACGACATCAGATGCACCGATTGCAGTCAGTG 

8530 8540 8550 8560 8570 8580 

AAGGCATGCGTAGTGTATTACTTGAGCTTAATCGAGCTAAAACGGGTAAGCAACGTTTGA 

8590 8600 8610 8620 8630 8640 

AAGCACTATCTCGTCTACGTCTAAAACACTTTGCGCCACTAACGCCTGCAAATAAAGAGC 

8650 8660 8670 8680 8690 8700 

CGCGTACCAAAATGGCGATGGGCGATCATTGTCAAGTAACAGCGAAAGAGTGGAATATCT 

8710 8720 8730 8740 8750 8760 

CACGTGAAGCACAAGATGCATTGGCCTGCGCAAGTCATCAAAAATTAGCTGCAGCATATG 

8770 8780 8790 8800 8810 8820 

AAGAAGGTTTCTTTGATACGTTAGTTTCACCTATGGCCGGCTTAACGAAAGATAACGTAT 

8830 8840 8850 8860 8870 8880 

TACGCGCAGATACAACAGTTGAGAAACTGGCTAAATTGAAACCTTGTTTTGATAAAGTAA 

8890 8900 8910 8920 8930 8940 

ACGGCACTATGACGGCGGGTAACAGTACTAACCTTACCGATGGAGCATCAGCTGTATTAC 

8950 8960 8970 8980 8990 9000 

TTGCAAGTGAAGAATGGGCAGCGGCACATAACTTACCAGTACAAGCTTATCTAACATTTG 

9010 9020 9030 9040 9050 9060 

GTGAAACGGCCGCTATCGACTTCGTTGATAAGAAAGAAGGTCTGTTAATGGCGCCTGCAT 

9070 9080 9090 9100 9110 9120 

ACGCAGTGCCAAAAATGTTGAAGCGTGCTGGQCTTACATTACAAGACTTCGATTACTATG 
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9130 


9140 


9150 


9160 


9170 


9180 


AAATACATGAAGCATTTGCTGCGCAGTTATTAGCAACGCTAGCAGCTTGGGAAGACGAAA 

9190 9200 9210 9220 9230 9240 

AATTCTGTAAAGAAAAACTGGGTCTAGATGCTGCGCTTGGTTCAATTGATATGACCAAGT 

9250 9260 9270 9280 9290 9300 

TAAACGTGAAAGGGAGTAGCTTAGCCACGGGTCACCCATTTGCCGCAACTGGTGGTCGTG 

9310 9320 9330 9340 9350 9360 

TTGTCGCTACGCTAGCGCAATTACTTGATCAGAAAGGTTCAGGTCGTGGTTTGATCTCGA 

9370 9380 9390 9400 9410 9420 

TTTGTGCTGCTGGTGGTCAAGGTATCACGGCAATTTTAGAGAAATAAACGCACTGTTTAT 

9430 9440 9450 9460 9470 9480 

TATCTATTGATTAAGCTGTCCTGAGATACTGGATATTTTTAAATAAAACGCCAATACTGC 

9490 9500 9510 9520 9530 9540 

AGAGTATTGGCGTTTTTTTGTAATACCAATTCCTATATAACGGTGCATTTTAAACACTTA 

9550 9560 9570 9580 9590 9600 

ATTTCCGGCATTGGTATCATAAAAAAGCAGCACCGAAGTGCTGCTTGATTGTAGATTAAC 

9610 9620 9630 9640 9650 9660 

CTATTAAAATAGAGAGGCTAGAATTAGTCTTCGTATGCTTCATTATGTACGCCAGCTGCA 

9670 9680 9690 9700 9710 9720 

CGACCCGATGGATCAGCATTGTTTTGGAAACTTTCATCCCAAGCTAATGCTTCTACAGTT 

9730 9740 9750 9760 9770 97B0 

GAACAAGCAACGGATTTACCAAACGGTACGCATTTCGCTGCTGAATCACCTGGGAAGTGA 

9790 9800 9810 9820 9830 9840 

TCTTCAAAGATGGCACGATAGTAGTAACCTTCTTTCGTATCTGGTGTGTTAATTGGGAAC 

9850 9860 9870 9880 9890 9900 

TTAAATGCTGCACTTGCTAACATTTGATCAGTTACCGCTTCTTCAACGTGTACTTTAAGT 

9910 9920 9930 9940 9950 9960 

TGGTCAATCCAAGAATAACCAACACCATCAGAGAATTGTTCTTTTTGACGCCATACAATT 

9970 9980 9990 10000 10010 10020 

TCTTCAGGTAGTAAATCTTCAAATGCTTCTCGAATGATGTTTTTCTCAATGCGGTCGCCC 

10030 10040 10050 10060 10070 10080 

GTGATCATTTTTAGTTCAGGGTTTAGACGCATTGACGCATCAACAAATTCTTTATCTAAG 

10090 10100 10110 10120 10130 10140 

AAAGGAACACGTGCTTCGATGCCCCAAGCTGCCATAGATTTGTTTGCACGTAAGCAATCA 

10150 10160 10170 10180 10190 10200 
AACATATGTAATTTATTTACTTTACGTACCGTCTCTTCATGGAATTCTTTCGCATTTGGC 

10210 10220 10230 10240 10250 10260 

GCTTTGTGGAAGTACAAGTAACCACCGAACACiTTCATCAGCACCTTCACCAGAAAGCACC 
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10270 


102B0 


10290 


10300 


10310 


10320 


ATCTTAATCCCCATGGCTTTAATTTTACGTGCCATTAGGTACATAGGGGTTGATGCACGA 

10330 10340 10350 10360 10370 10380 

ATTGTTGTTACATCGTAGGTTTCAATGTGGTAAATCACGTCGCGTAAAGCGTCGATACCT 

10390 10400 10410 10420 10430 10440 

TCTTGCACAGTAAATTCAATTGAATGATGGATA6TACCTAAGTGATCTGCCACTTTTTGT 

10450 10460 10470 10480 10490 lOSOO 

GCAGCGGCTAAATCTGGAGAACCATTTAGGCCTACAGAGAAAGAGTGTAGTTGTGGCCAC 

10510 10520 10530 10540 10550 10560 

CATGCTTCGGTTTTACCACCGTCTTCAATACGACGTTTTGCATACTGTTGGGTGATTGCT 

10570 10580 10590 10600 10610 10620 

GAAATAACAGATGAATCTAACCCGCCTGATAATAATACGCCGTAAGGTACATCACACATT 

10630 10640 10650 10660 10670 10680 

AATTGACGTTTAACTGCATCTTCCAAACCTTGCTTAACAACGCTTTTATCACCACCATTT 

10690 10700 10710 10720 10730 10740 

TGTGCAACGTTATCAAAATCTTTCCAATCACGTTGATAATTU^GGCGTGACTACACCATCC 

10750 10760 10770 10780 10790 10800 

TTACTCCACAGGTAATGACCTGCTGGGAATTCTTCAATTTGAGTACAAATTGGCACTAGT 

10810 10820 10830 10840 10850 10860 

GCTTTCATTTCAGAGGCAACATAAAAGTTACCGTGTTCATCATAGCCCGTATAAAGAGGG 

10870 10880 10890 10900 10910 10920 

ATGATACCGATATGGTCACGGCCAATCAGGTAAGCGTCCTCTGTTTCGTCATATAAAGCG 

10930 10940 10950 10960 10970 10980 

AAAGCAAAAATACCATTTAGATCATCTAAAAATTGTGTGCCTTTTTCTTTATATAGCGCA 

10990 11000 11010 11020 11030 11040 

AGTATCACTTCGCAATCTGATTCTGTTTGGAATTCAAAGTCTACGTTCAGCGTTTTCTTT 

11050 11060 11070 11080 11090 11100 

AAATCTTTGTGGTTATAAATTTCACCATTAACAGCAAGTACGTGTGTCTTTTCTTCATTA 

11110 11120 11130 11140 11150 11160 

TATAGCGGCTGTGCACCATTATTTACATCGACAATAGCAAGACGTTCATGAACTAAAATA 

11170 11180 11190 11200 11210 11220 

GCATTGTCACTTGTATAGATACCTGACCAATCTGGGCCGCGGTGACGTAGTAACTTTGAT 

11230 11240 11250 11260 11270 11280 

AGTTCTAGTGCTTGTTCGCGAAGAGGTTTAATGTCTGATTTGATGTCTAGAATTCCGAAT 

11290 11300 11310 11320 11330 * 11340 

ATTGAGCACATAACTAATTCCTTCTGGGGCTGCGTCTGCAGCTAACTTTCTAAATAGTGT 

11350 11360 11370 11380 11390 11400 

GTCTAATTTGCCACATTGTAGATTTAATGCAAACATTAATGATAAAACATTTATAAAAAA 
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11410 


11420 


11430 


11440 


11450 


11460 


TGTAATTCAATGTGGAATCGATAATTTAATGGCTTAAAAGTGAAGATCCATTAATTGTGA 

11470 11480 11490* 11500 11510 11520 

TGGCGAGGTGATAGACCAATGTAGACCTTAATGAATAAAGCAGGCACGATTGAATCCATT 

11530 11540 11550 11560 11570 11580 
CAACGCAAAGTGGTACTAACTATTGTTTTAAACGTTATAAATAGTGTTTTAAAGGTTATA 

11590 11600 11610 1162a 11630 11640 

AGTAAATAATTTAAAAACAATAATAATCCACATGCATTAAATTTATCATGATAAACCGCT 

11650 11660 11670 11680 11690 11700 

ATATCTCAATGGCAATTTGGGATAAGTGTAAAATATATGTAAAATGAATGAGTTGACTTG 

11710 11720 11730 11740 11750 11760 

CTTTTTTTACACTAAGTGATGAAATTAAAGCTAGATGTCGTTGTTAGCATTGATTAATAA 

11770 11780 11790 11800 11810 11820 

CGTACTAAAATACGACATCTAGTATAGAAATTTAAAAAACAGTTGGTTTTGATAGCATAA 

11830 11840 11850 11860 11870 11880 

CTGCATAAACTAATCAGCTTATTGTCTGTAATATTTTTGTAATTTAAATAGGTTTAATAA 

11890 11900 11910 11920 11930 11940 

AATTATATGTCTGATAAATATAAACCGTACGACCTTTCCTTTAAAAAGACGTTTTTGCTG 

11950 11960 11970 11980 11990 12000 

CCTAA6TTTTGGCCTGTGTGGTTCGGGGTGTTTGCAATATACTTATTAGCTTTTATGCCA 

120X0 12020 12030 12040 12050 12060 
GTAAAGCCGCGTGATAAATTTGCTCGATTCATAGCGAAGAAATTGTTTAGTCTAAAAATG 

12070 12080 12090 12100 12110 12120 

ATGGCAAAGCGTAAAAAGGTAGCAAAGATCAATTTATCTATGTGCTTCCCTGAAATGGAT 

12130 12140 12150 12160 12170 12180 

GATACGGAACAAGACCGTATAATCATGGTCAATCTAGTTACTTTTTGTCAAACTATCTTA 

12190 12200 12210 12220 12230 12240 

AGTTATGCAGAGCCAAGTGCGCGTAGTCGTGCTTATAACCGTGACCGTATGATAGTGCAT 

12250 12260 12270 12280 12290 12300 

GGTGGCGAGAATTTATTTCCGCTACTTGAACAAGGTAAGGCTTGTATCTTATTAGTGCCG 

12310 12320 12330 12340 12350 12360 

CATAGCTTCGCTATTGATTTTGCAGGTTTACACATTGCTTCTTATGGCGCGCCATTTTGT 

12370 12380 12390 12400 12410 12420 

ACTATGTTTAACAATTCTGAGAATGAGTTGTTCGATTGGCTGATGACACGTCAACGCGCT 

12430 12440 12450 12460 12470 12480 

ATGTTTGGAGGCACTGTTTATCACCGCAAGGCAGGGCTAGGGGCTCTAGTTAAATCACTT 

12490 12500 12510 12520 12530 12540 

AAGAGCGGTGAAAGCTGTTATTACTTACCTGMGAAGACCATGGACCTAAGCGTAGTGTA 
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12550 12560 12570 12580 12590 12600 

TTTGCGCCTTTATTTGCGACTCAAAAAGCAACTTTACCTGTAATGGGCAAGCTAGCAGAA 

12610 12620 12630 12640 12650 12660 

AAAACAAATGCACTCGTTGTTCCTGTTTATGCGGCATATAATGAATCACTAGGTAAATTT 

12670 12680 12690 12700 12710 12720 
GAAACCTTTATTCGACCAGCAATGCAAAACTTTCCATCAGAAAGCCCAGAACAAGATGCA 

12730 12740 12750 12760 12770 12780 

GTGATGATGAATAAAGAGATTGAAGCCTTGATTGAATGTGGTGTTGATCAATATATGTGG 

12790 12800 12810 12820 12830 12840 

ACACTTAGATTATTGAGAACACGTCCGGACGGTAAAAAAATCTACTAATAAAGTTTAATA 

12850 12860 12870 12880 12890 12900 

AACACCATAATCTTCGTTGAATATGGTGTTTACCCCCCTGAATACCCTCTAAATTAATAA 

12910 12920 12930 12940 12950 12960 

CAAAAAAAGCCATTTACGTAACATCTAATGATGATTTAGCCTGCACTTGCTTTGTTTTTA 

12970 12980 12990 13000 13010 13020 

GTCTTAAGAGCCTAATAAACTTGATCTAGGTATAGATTCTGTCTTTCTTTACGTAACGCG 

13030 13040 13050 13060 13070 13080 

ATCTATTTTTTTTAACCGATAGTTGTTATAATTAGTTTCATATGAAAGAGATATCGTTTC 

13090 13100 13110 13120 13130 13140 
AGTAAAAGCTATTTCGTTTCAATAGATAATTTATTTATAGTCATATTTTCTGTAATGACA 

13150 13160 13170 13180 13190 13200 

ATCATTTTCTCATCTAGACTATAGATAAGAATACGAATTAAGTAAGAACATTAATTTTAC 

13210 13220 13230 13240 13250 13260 

AAGAATATAAAATATCCCATCGGAGCTATAAGAATGAAAAAGACTAAAATTGTTTGTACA 

13270 13280 13290 13300 13310 13320 

ATTGGTCCAAAAACTGAATCAGTAGAGAAACTAACAGAGCTTGTTAATGCAGGCATGAAC 

13330 13340 13350 13360 13370 13380 

GTTATGCGTTTAAATTTCTCTCATGGTAACTTTGCTGAACATTCAGTGCGTATTCAAAAT 

13390 13400 13410 13420 13430 13440 

ATCCCTCAAGTAAGTGAAAACCTGAATAAGAAAATTGCTGTTTTACTGGATACTAAAGGT 

13450 13460 13470 13480 13490 13500 

CCAGAAATCCGTACGATTAAACTAGAAAACGGTGACGATGTAATGTTGACCGCTGGTCAG 

13510 13520 13530 13540 13550 13560 

TCATTCACGTTTACAACAGACATTAACGTGGTAGGTAATAAAGACTGTGTTGCTGTAACA 

13570 13580 13590 13600 13610 13620 
TATGCTGGTTTTGCTAAAGACCTTAATCCTGGTGCAATCATCCTTGTTGATGATGGTTTA 

13630 13640 13650 13660 13670 13680 

ATTGAAATGGAAGTTGTTGCAACAACTGACAQTGAAGTTAAATGTACAGTATTAAATACT 


wo 98/55625 


47 


/ 


106 


PCT/US98/11639 


13690 13700 13710 13720 13730 13740 

GGTGCACTTGGTGAAAATAAAGGCGTTAACTTACCTAACATCAGTGTAGGTCTACCTGCA 

13750 13760 13770 13780 13790 13800 

TTGTCAGAAAAAGATAAAGCTGATTTAGCGTTTGGTTGTGAGCAAGAAGTTGATTTTGTT 

13810 13820 13830 13840 13850 13860 
GCTGCATCATTTATTCGTAAGGCTGATGATGTAAGAGAAATTCGTGAAATCCTATTTAAT 

13870 13880 13890 13900 13910 13920 

AATGGTGGCGAAAACATTCAGATTATCTCGAAAATTGAAAACCAAGAAGGTGTAGACAAT 

13930 13940 13950 13960 13970 13980 

TTCGATGAAATCTTAGCTGAATCAGACGGTATCATGGTTGCTCGTGGCGATCTCGGTGTT 

13990 14000 14010 14020 14030 14040 

GAGATCCCAGTTGAAGAAGTGATCATGGCACAGAAGATGATGATCAAAAAATGTAATAAA 

14050 14060 14070 14080 14090 14100 

GCAGGTAAAGTTGTAATTACTGCAACACAAATGCTTGATTCAATGATCAGTAACCCACGT 

14110 14120 14130 14140 14150 14160 

CCAACACGTGCAGAAGCGGGCGATGTTGCCAATGCTGTGCTTGACGGTACCGACGCGGTA 

14170 14180 14190 14200 14210 14220 

ATGCTTTCTGGTGAAACTGCGAAAGGTAAATACCCAGTTGAAGCTGTGTCTATCATGGCA 

14230 14240 14250 14260 14270 14280 

AACATCTGTGAACGTACTGATAACTCAATGTCTTCGGATTTAGGTGCGAACATTGTTGCT 

14290 14300 14310 14320 14330 14340 

AAAAGCATGCGCATTACAGAAGCTGTGTGTAAAGGTGCGGTAGAAACAACAGAAAAATTG 

14350 14360 14370 14380 14390 14400 

TGTGCTCCACTTATTGTTGTTGCAACTCGTGGCGGTAAATCAGCAAAATCTGTTCGTAAA 

14410 14420 14430 14440 14450 14460 

TACTTCCCGAAAGCAAATATTCTTGCTATCACAACAAATGAAAAAGCAGCGCAACAGTTA 

14470 14480 14490 14500 14510 14520 

TGCCTAACTAAAGGCGTAAGCAGCTGCATCGTTGAGCAGATTGATAGCACTGATGAGTTC 

14530 14540 14550 14560 14570 14580 

TACCGTAAAGGTAAAGAGCTTGCATTAGCAACTGGTTTAGCTAAAGAAGGCGATATCGTT 

14590 14600 14610 14620 14630 14640 

GTTATGGTATCAGGTGCGTTAGTACCATCAGGTACAACGAATACGGCATCTGTTCACCAA 

14650 14660 14670 14680 14690 14700 

CTTTAAGTTGCCATATTGATATTATAAAAAAGAGAGCGTATGCTCTCTTTTTTTATATCT 

14710 14720 14730 14740 14750 14760 

GTAGTTTATATGTCTGTACAAAAAAATGATAAAGAGTACATAAACTATTAATATAGCGTA 

14770 14780 14790 14800 14810 14820 

ATATATAATGATTAACGGTGATGAAAGGGTTAAATAAATGGATAGTGCTAAACATAAAAT 
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14830 14840 14850 14860 14870 14880 

TGGCTTAGTCCTTTCTGGCGGTGGTGCGAAAGGTATTGCTCATCTTGGTGTATTAAAATA 

14890 14900 14910 * 14920 14930 14940 

CCTGTTAGAGCAAGATATAAGACCGAATGTAATTGCGGGTACAAGTGCTGGCTCTATGGT 

14950 14960 14970 14980 14990 15000 

TGGTGCACTTTATTGCTCAGGACTTGAGATTGATGACATTTTACAATTCTTCATCGATGT 

15010 15020 15030 15040 15050 15060 
AAAACCTTTTTCTTGGAAGTTTACCCGTGCCCGTGCTGGCTTTATAGACCCGGCAAAATT 

15070 15080 15090 15100 15110 15120 

ATATCCTGAAGTGCTAAAATATATCCCCGAGGATAGCTTTGAGTACCTTCAACCTGAATT 

15130 15140 15150 15160 15170 15180 

GCGCATTGTTGCCACCAACATGTTACTCGGTAAAGAGCATATATTTAAAGATGGCTCCGT 

15190 15200 15210 15220 15230 15240 

GATTAATGCCTTATTAGCATCAGCCAGCTACCCTTTAGTTTTTTCTCCGAO'GATCATTGA 

15250 15260 15270 15280 15290 15300 
CGATCAAGTGTATTCAGATGGCGGTATTGTTAATCATTTCCCCGTGAGTGTCATTGAAGA 

15310 15320 15330 15340 15350 15360 

TGATTGCGATAAAATAATCGGCGTATACGTGTCGCCCATTCGTCAGGTCGAAGCTGACGA 

15370 15380 15390 15400 15410 15420 

ACTCTCGAGTATAAAAGACGTGGTATTACGTGCGTTCACGCTGCAGGGTAGTGGTGCTGA 

15430 15440 15450 15460 15470 15480 

ATTAGATAAACTATCGCAATGTGATGTGCAAATTTATCCAGAAGCGCTATTGAATTACAA 

15490 15500 15510 15520 15530 15540 

TACGTTTGCAACCGATGAAAAATCATTACGGGAGATCTACCAGATTGGTTATGATGCTGC 

15550 15560 15570 15580 15590 15600 

AAAAGATCAACATGACAACCTTATGGCATTGAAAGAAAGTATCACCACCAGCGAGGTTAA 

15610 15620 15630 15640 15650 15660 

AAAGAACGTCTTTAGCAAATGGTTTGGTGATAAACTTGCTAGCAACAGCGGCAAATAGCG 

15670 15680 15690 15700 15710 15720 

GCCCACACGGATTTATACACTAGGATAATGGGCGTTAATAGCCTCACTGTCGTTGTGTGG 

15730 15740 15750 15760 15770 15780 

TCTCTAATTTTAGCTAAATCTTGTGTTATACTGACTTCCTATTAATCATAAACGATTTAT 

15790 15800 15810 15820 15830 15840 

CACGGTAAACATGACTCAAATAAATAACCCGCTTCACGGCATGACACTCGAAAAAGTAAT 

15850 15860 15870 15880 15890 15900 

TAACAGTCTCGTTGAACAATATGGCTGGGATGGTCTTGGATACTACATCAACATTCGTTG 

15910 15920 15930 15940 15950 15960 

CTTTACTGAAAATCCAAGTGTTAAGTCTAGTCTTAAATTTTTACGTAAAACCCCTTGGGC 
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15970 15980 15990 16000 16010 16020 

ACGT(3ATAAAGTAGAAGCGCTATATATCAAAATGGTGACTGAAGGCTAACTGTCTCCACG 

16030 16040 16050 16060 16070 16080 

CTAGCGAACCGCTGTTTATAGTTAATATAAGTACTATAAGCAGGGCTCGTTAATTCAGTA 

16090 16100 16110 16120 16130 16140 
TGTAATTAATCCTGAATACCTCCGCTTATTTCAACATT6TACTCTCTAGATAACACTCTC 

16150 16160 16170 16180 16190 16200 

AACATTACACCTTCAACATCACAGCCTCCACATAACATCCGATGACATAGCCCTGTTATT 

16210 16220 16230 16240 16250 16260 
TTTCACATTTATCTATATGCTATATATTTTAGCCATTTGATCAATTGAGTTAATTTCTGC 

16270 16280 16290 16300 16310 16320 

AATGACAAAGATATACCATCATCCAGTACAAATTTATTATGAAGATACCGACCATTCTGG 

16330 16340 16350 16360 16370 16380 
TGTTGTTTACCACCCTAACTTTTTAAAATACTTTGAACGTGCACGTGAGCATGTGATAAA 

16390 16400 16410 16420 16430 16440 

TAGTGACTTACTAGCAACATTGTGGAATGAACGCGGTTTAGGTTTTGCGGTGTATAAAGC 

16450 16460 16470 16480 16490 16500 

CAATATGACTTTTCAGGATGGGGTCGAATTTGCTGAAGTGTGTGATATTCGCACTTCTTT 

16510 16520 16530 16540 16550 16560 

TGTCCTAGACGGTAAGTACAAAACGATCTGGCGCCAAGAAGTATGGCGTCCGAATGCGAC 

16570 16580 16590 16600 16610 16620 

TAGGGCTGCCGTTATCGGTGATATTGAAATGGTGTGCTTAGACAAACAAAAACGTTTACA 

16630 16640 16650 16660 16670 16660 

GCCCATCCCTGATGATGTGTTAGCTGCAATGGTTAGTGAATAAATGGTTCATGCATAAAT 

16690 16700 16710 16720 16730 16740 

AGTTAATACATGATTCTGGCCCGTCACGTTTACAGATAAGAGGCATCCGATGCCTCCTTC 

16750 16760 16770 16780 16790 16800 

CTATTACCAATACTACTGCTTATCCCTTTCTAACTATCTTTAGCGTCCATAACACACTGA 

16810 16820 16830 16840 16850 16860 

GCATTTATTCTATTAATCAGTGATTGTGATTTAATTATCTTCTATATATGTAATTTAATG 

16870 16880 16890 16900 16910 16920 

TAATTTTCAATTTATTTTTAGCTACATTAAGGCTTACGAATGTACGCTAAAATGAGATGT 

16930 16940 16950 16960 16970 16980 

CAGACTAATTTTAGCTTATTAATCTGTTAGCCGTTTATATTTTATAAAGATGGGATTTAA 

16990 17000 17010 17020 17030 17040 

CTTAAATGCAATTAATTATGGCGTAAATAGAGTGAAAACATGGCTAATATTCACTAAGTC 

17050 17060 17070 17080 17090 17100 

CTGAATTTTATATAAAGTTTAATCTGTTATTTTAGCGTTTACCTGGTCTTATCAGTGAGG 
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17110 17120 17130 17140 17150 17160 

TTTATAGCCATTATTAGTGGGATTGAAGTGATTTTTAAAGCTATGTATATTATTGCAAAT 

17170 17180 17190 * 17200 17210 17220 

ATAAATTGTAACAATTAAGACTTTGGACACTTGAGTTCAATTTCGAATTGATTGGCATAA 

17230 17240 17250 17260 17270 17280 

AATTTAAAACAGCTAAATCTACCTCAATCATTTTAGCAAATGTATGCAGGTAGATTTTTT 

17290 17300 17310 17320 17330 17340 

TCGCCATTTAAGAGTACACTTGTACGCTAGGTTTTTGTTTAGTGTGCAAATGAACGTTTT 

17350 17360 17370 17380 17390 17400 

GATGAGCATTGTTTTTAGAGCACAAAATAGATCCTTACAGGAGCAATAACGCAATGGCTA 

17410 17420 17430 17440 17450 17460 

AAAAGAACACCACATCGATTAAGCACGCCAAGGATGTGTTAAGTAGTGATGATCAACAGT 

17470 17480 17490 17500 17510 17520 

TAAATTCTCGCTTGCAAGAATGTCCGATTGCCATCATTGGTATGGCATCGGTTTTTGCAG 

17530 17540 17550 17560 17570 17580 

ATGCTAAAAACTTGGATCAAXTCTGGGATAACATCGTTGACTCTGTGGACGCTATTATTG 

17590 17600 17610 17620 17630 17640 

ATGTGCCTAGCGATCGCTGGAACATTGACGACCATTACTCGGCTGATAAAAAAGCAGCTG 

17650 17660 17670 17680 17690 17700 

ACAAGACATACTGCAAACGCGGTGGTTTCATTCCAGAGCTTGATTTTGATCCGATGGAGT 

17710 17720 17730 17740 17750 17760 

TTGGTTTACCGCCAAATATCCTCGAGTTAACTGACATCGCTCAATTGTTGTCATTAATTG 

17770 17780 17790 17800 17810 17820 

TTGCTCGTGATGTATTAAGTGATGCTGGCATTGGTAGTGATTATGACCATGATAAAATTG 

17830 17840 17850 17860 17870 17880 

GTATCACGCTGGGTGTCGGTGGTGGTCAGAAACAAATTTCGCCATTAACGTCGCGCCTAC 

17890 17900 17910 17920 17930 17940 

AAGGCCCGGTATTAGAAAAAGTATTAAAAGCCTCAGGCATTGATGAAGATGATCGCGCTA 

17950 17960 17970 17980 17990 18000 

TGATCATCGACAAATTTAAAAAAGCCTACATCGGCTGGGAAGAGAACTCATTCCCAGGCA 

18010 18020 18030 18040 18050 18060 

TGCTAGGTAACGTTATTGCTGGTCGTATCGCCAATCGTTTTGATTTTGGTGGTACTAACT 

18070 18080 18090 18100 18110 18120 

GTGTGGTTGATGCGGCATGCGCTGGCTCCCTTGCAGCTGTTAAAATGGCGATCTCAGACT 

18130 18140 18150 18160 18170 18180 
TACTTGAATATCGTTCAGAAGTCATGATATCGGGTGGTGTATGTTGTGATAACTCGCCAT 

18190 18200 18210 18220 18230 18240 

TCATGTATATGTCATTCTCGAAAACACCAGCATTTACCACCAATGATGATATCCGTCCGT 
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18250 18260 16270 18280 18290 18300 

TTGATGACGATTCAAAAGGCATGCTGGTTGGTGAAGGTATTGGCATGATGGCGTTTAAAC 

18310 18320 18330 18340 18350 18360 

GTCTTGAAGATGCTGAACGTGACGGCGACAAAATTTATTCTGTACTGAAAGGTATCGGTA 

18370 18380 18390 18400 18410 18420 
CATCTTCAGATGGTCGTTTCAAATCTATTTACGCTCCACGCCCAGATGGCCAAGCAAAAG 

18430 18440 18450 16460 18470 16480 

CGCTAAAACGTGCTTATGAAGATGCCGGTTTTGCCCCTGAAACATGTGGTCTAATTGAAG 

18490 18500 18510 18520 18530 18540 

GCCATGGTACGGGTACCAAAGCGGGTGATGCCGCAGAATTTGCTGGCTTGACCAAACACT 

18550 18560 18570 18580 18590 18600 

TTGGCGCCGCCAGTGATGAAAAGCAATATATCGCCTTAGGCTCAGTTAAATCGCAAATTG 

18610 .18620 18630 18640 18650 18660 

GTCATACTAAATCTGCGGCTGGCTCTGCGGGTATGATTAAGGCGGCATTAGCGCTGCATC 

18670 18680 16690 18700 18710 18720 

ATAAAATCTTACCTGCAACGATCCATATCGATAAACCAAGTGAAGCCTTGGATATCAAAA 

18730 18740 18750 18760 18770 18780 

ACAGCCCGTTATACCTAAACAGCGAAACGCGTCCTTGGATGCCACGTGAAGATGGTATTC 

18790 16800 18610 18820 18830 16840 

CACGTCGTGCAGGTATCAGCTCATTTGGTTTTGGCGGCACCAACTTCCATATTATTTTAG 

18850 18860 18870 18880 18890 18900 

AAGAGTATCGCCCAGGTCACGATAGCGCATATCGCTTAAACTCAGTGAGCCAAACTGTGT 

16910 18920 16930 18940 18950 18960 

TGATCTCGGCAAACGACCAACAAGGTATTGTTGCTGAGTTAAATAACTGGCGTACTAAAC 

18970 18980 18990 19000 19010 19020 

TGGCTGTCGATGCTGATCATCAAGGGTTTGTATTTAATGAGTTAGTGACAACGTGGCCAT 

19030 19040 19050 19060 19070 19080 

TAAAAACCCCATCCGTTAACCAA6CTCGTTTAGGTTTTGTTGCGCGTAATGCAAATGAAG 

19090 19100 19110 19120 19130 19140 
CGATCGCGATGATTGATACGGCATTGAAACAATTCAATGCGAACGCAGATAAAATGACAT 

19150 19160 19170 19180 19190 19200 

GGTCAGTACCTACCGGGGTTTACTATCGTCAAGCCGGTATTGATGCAACAGGTAAAGTGG 

19210 19220 19230 19240 19250 19260 

TTGCGCTATTCTCAGGGCAAGGTTCGCAATACGTGAACATGGGTCGTGAATTAACCTGTA 

19270 19280 19290 19300 19310 19320 
ACTTCCCAAGCATGATGCACAGTGCTGCGGCGATGGATAAAGAGTTCAGTGCCGCTGGTT 

19330 19340 19350 19360 19370 19380 

TAGGCCAGTTATCTGCAGTTACTTTCCCTATCCCTGTTTATACGGATGCCGAGCGTAAGC 
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19390 X9400 X9410 19420 19430 19440 

TACAAGAAGAGCAATTACGTTTAACGCAACATGCGCAACCAGCGATTGGTAGTTTGAGTG 

19450 19460 19470 19480 19490 19500 

TTGGTCTGTTCAAAACGTTTAAGCAAGCAGGTTTTAAAGCTGATTTTGCTGCCGGTCATA 

19510 19520 19530 19540 19550 19560 
GTTTCGGTGAGTTAACCGCATTATGGGCTGCCGATGTATTGAGCGAAAGCGATTACATGA 

19570 19580 19590 19600 19610 19620 

TGTTAGCGCGTAGTCGTGGTCAAGCAATGGCTGCGCCAGAGCAACAAGATTTTGATGCAG 

19630 19640 19650 19660 19670 19680 
GTAAGATGGCCGCTGTTGTTGGTGATCCAAAGCAAGTCGCTGTGATCATTGATACCCTTG 

19690 19700 19710 19720 19730 19740 

ATGATGTCTCTATTGCTAACTTCAACTCGAATAACCAAGTTGTTATTGCTGGTACTACGG 

19750 19760 19770 19780 19790 19800 

AGCAGGTTGCTGTAGCGGTTACAACCTTAGGTAATGCTGGTTTCAAAGTTGTGCCACTGC 

19810 19820 19830 19840 19850 19860 

CGGTATCTGCTGCGTTCCATACACCTTTAGTTCGTCACGCGCAAAAACCATTTGCTAAAG 

19870 19880 19890 19900 19910 19920 

CGGTTGATAGCGCTAAATTTAAAGCGCCAAGCATTCCAGTGTTTGCTAATGGCACAGGCT 

19930 19940 19950 19960 19970 19980 

TGGTGCATTCAAGCAAACCGAATGACATTAAGAAAAACCTGAAAAACCACATGCTGGAAT 

19990 20000 20010 20020 20030 20040 

CTGTTCATTTCAATCAAGAAATTGACAACATCTATGCTGATGGTGGCCGCGTATTTATCG 

20050 20060 20070 20080 20090 20100 

AATTTGGTCCAAAGAATGTATTAACTAAATTGGTTGAAAACATTCTCACTGAAAAATCTG 

20110 20120 20130 20140 20150 20160 

ATGTGACTGCTATCGCGGTTAATGCTAATCCTAAACAACCTGCGGACGTACAAATGCGCC 

20170 20180 20190 20200 20210 20220 

AAGCTGCGCTGCAAATGGCAGTGCTTGGTGTCGCATTAGACAATATTGACCCGTACGACG 

20230 20240 20250 20260 20270 20280 

CCGTTAAGCGTCCACTTGTTGCGCCGAAAGCATCACCAATGTTGATGAAGTTATCTGCAG 

20290 20300 20310 20320 20330 20340 

CGTCTTATGTTAGTCCGAAAACGAAGAAAGCGTTTGCTGATGCATTGACTGATGGCTGGA 

20350 20360 20370 20380 20390 20400 

CTGTTAAGCAAGCGAAAGCTGTACCTGCTGTTGTGTCACAACCACAAGTGATTGAAAAGA 

20410 20420 20430 20440 20450 20460 

TCGTTGAAGTTGAAAAGATAGTTGAACGCATTGTCGAAGTAGAGCGTATTGTCGAAGTAG 

20470 20480 20490 20500 20510 20520 

AAAAAATCGTCTACGTTAATGCTGACGGTTCQCTTATATCGCAAAATAATCAAGACGTTA 
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20530 


20540 


20550 


20560 


20570 


20580 


ACAGCGCTGTTGTTAGCAACGTGACTAATAGCTCAGTGACTCATAGCAGTGATGCTGACC 

20590 20600 20610 20620 20630 20640 

TTGTTGCCTCTATTGAACGCAGTGTTGGTCAATTTGTTGCACACCAACAGCAATTATTAA 

20650 20660 20670 20680 20690 20700 

ATGTACATGAACAGTTTATGCAAGGTCCACAAGACTACGCGAAAACAGTGCAGAACGTAC 

20710 20720 20730 20740 20750 20760 

TTGCTGCGCAGACGAGCAATGAATTACCGGAAAGTTTAGACCGTACATTGTCTATGTATA 

20770 20760 20790 20800 20810 20820 

ACGAGTTCCAATCAGAAACGCTACGTGTACATGAAACGTACCTGAACAATCAGACGAGCA 

20830 20840 20850 20860 20870 20880 

ACATGAACACCATGCTTACTGGTGCTGAAGCTGATGTGCTAGCAACCCCAATAACTCAGG 

20890 20900 20910 20920 20930 20940 

TAGTGAATACAGCCGTTGCCACTAGTCACAAGGTAGTTGCTCCAGTTATTGCTAATACAG 

20950 20960 20970 20980 20990 21000 
TGACGAATGTTGTATCTAGTGTCAGTAATAACGCGGCGGTTGCAGTGCAAACTGTGGCAT 

21010 21020 21030 21040 21050 21060 

TAGCGCCTACGCAAGAAATCGCTCCAACAGTCGCTACTACGCCAGCACCCGCATTGGTTG 

21070 21080 21090 21100 21110 21120 

CTATCGTGGCTGAACCTGTGATTGTTGCGCATGTTGCTACAGAAGTTGCACCAATTACAC 

21130 21140 21150 21160 21170 21180 

CATCAGTTACACCAGTTGTCGCAACTCAAGCGGCTATCGATGTAGCAACTATTAACAAAG 

21190 21200 21210 21220 21230 21240 
TAATGTTAGAAGTTGTTGCTGATAAAACCGGTTATCCAACGGATATGCTGGAACTGAGCA 

21250 21260 21270 21280 21290 21300 

TGGACATGGAAGCTGACTTAGGTATCGACTCAATCAAACGTGTTGAGATATTAGGCGCAG 

21310 21320 21330 21340 21350 21360 

TACAGGAATTGATCCCTGACTTACCTGAACTTAATCCTGAAGATCTTGCTGAGCTACGCA 

21370 21380 21390 21400 21410 21420 

CGCTTGGTGAGATTGTCGATTACATGAATTCAAAAGCCCAGGCTGTAGCTCCTACAACAG 

21430 21440 21450 21460 21470 21480 
TACCTGTAACAAGTGCACCTGTTTCGCCTGCATCTGCTGGTATTGATTTAGCCCACATCC 

21490 21500 21510 21520 21530 21540 

AAAACCTAATGTTAGAAGTGGTTGCAGACAAAACCGGTTACCCAACAGACATGCTAGAAC 

21550 21560 21570 21580 21590 21600 

TGAGCATGGATATGGAAGCTGACTTAGGTATTGATTCAATCAAGCGTGTGGAAATCTTAG 

21610 21620 21630 21640 21650 21660 

GTGCAGTACAGGAGATCAT7UVCTGATTTACCTGAGCTAAACCCTGAAGATCTTGCTGAAT 
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21670 21680 21690 21700 21710 21720 

TACGCACCCTAGGTGAAATCGTTAGTTACATGCAAAGCAAAGCGCCAGTCGCTGAAAGTG 

21730 21740 21750 * 21760 21770 21780 

CGCCAGTGGCGACGGCTCCTGTAGCAACAAGCTCAGCACCGTCTATCGATTTGAACCACA 

21790 21800 21610 21820 21830 21840 
TTCAAACAGTGATGATGGATGTAGTTGCAGATAAGACTGGTTATCCAACTGACATGCTAG 

21850 21860 21870 21880 21890 21900 

AACTTGGCATGGACATGGAAGCTGATTTAGGTATCGATTCAATCAAACGTGTGGAAATAT 

21910 21920 21930 21940 21950 21960 

TAGGCGCAGTGCAGGAGATCATCACTGATTTACCTGAGCTAAACCCAGAAGACCTCGCTG 

21970 21980 21990 22000 22010 22020 

AATTACGCACGCTAGGTGAAATCGTTAGTTACATGCAAAGCAAAGCGCCAGTCGCTGAGA 

22030 22040 22050 22060 22070 22080 
GTGCGCCAGTAGCGACGGCTTCTGTAGCAACAAGCTCTGCACCGTCTATCGATTTAAACC 

22090 22100 22110 22120 22130 22140 

ATATCCAAACAGTGATGATGGAAGTGGTTGCAGACAAAACCGGTTATCCAGTAGACATGT 

22150 22160 22170 22180 22190 22200 

TAGAACTTGCTATGGACATGGAAGCTGACCTAGGTATCGATTCAATCAAGCGTGTAGAAA 

22210 22220 22230 22240 22250 22260 

TTTTAGGTGCGGTACAGGAAATCATTACTGACTTACCTGAGCTTAACCCTGAAGATCTTG 

22270 22280 22290 22300 22310 22320 

CTGAACTACGTACATTAGGTGAAATCGTTAGTTACATGCAAAGCAAAGCGCCCGTAGCTG 

22330 22340 22350 22360 22370 22380 

AAGCGCCTGCAGTACCTGTTGCAGTAGAAAGTGCACCTACTAGTGTAACAAGCTCAGCAC 

22390 22400 22410 22420 22430 22440 

CGTCTATCGATTTAGACCACATCCAAAATGTAATGATGGATGTTGTTGCTGATAAGACTG 

22450 22460 22470 22480 22490 22500 

GTTATCCTGCCAATATGCTTGAATTAGCAATGGACATGGAAGCCGACCTTGGTATTGATT 

22510 22520 22530 22540 22550 22560 
CAATCAAGCGTGTTGAAATTCTAGGCGCGGTACAGGAGATCATTACTGATTTACCTGAAC 

22570 22580 22590 22600 22610 22620 

TAAACCCAGAAGACTTAGCTGAACTACGTACGTTAGAAGAAATTGTAACCTACATGCAAA 

22630 22640 22650 22660 22670 22680 

GCAAGGCGAGTGGTGTTACTGTAAATGTAGTGGCTAGCCCTGAAAATAATGCTGTATCAG 

22690 22700 22710 22720 22730 22740 
ATGCATTTATGCAAAGCAATGTGGCGACTATCACAGCGGCCGCAGAACATAAGGCGGAAT 

22750 22760 22770 22780 22790 22800 

TTAAACCGGCGCCGAGCGCAACCGTTGCTATCTCTCGTCTAAGCTCTATCAGTAAAATAA 
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228X0 22820 22830 22840 22850 22860 
GCCAAGATTGTAAAGGTGCTAACGCCTTAATCGTAGCTGATGGCACTGATAATGCTGTGT 

22670 22880 22890 22900 22910 22920 
TACTTGCAGACCACCTATTGCAAACTGGCTGGAATGTAACTGCATTGCAACCAACTTGGG 

22930 22940 22950 22960 22970 22980 
TAGCTGTAACAACGACGAAAGCATTTAATAAGTCAGTGAACCTGGTGACTTTAAATGGCG 

22990 23000 23010 23020 23030 23040 

TTGATGAAACTGAAATCAACAACATTATTACTGCTAACGCACAATTGGATGCAGTTATCT 

23050 23060 23070 23080 23090 2310O 
ATCTGCACGCAAGTAGCGAAATTAATGCTATCGAATACCCACAAGCATCTAAGCAAGGCC 

23110 23120 23130 23140 23150 23160 

TGATGTTAGCCTTCTTATTAGCGAAATTGAGTAAAGTAACTCAAGCCGCTAAAGTGCGTG 

23170 23180 23190 23200 23210 23220 

GCGCCTTTATGATTGTTACTCAGCAGGGTGGTTCATTAGGTTTTGATGATATCGATTCTG 

23230 23240 23250 23260 23270 23280 

CTACAAGTCATGATGTGAAAACAGACCTAGTACAAAGCGGCTTAAACGGTTTAGTTAAGA 

23290 23300 23310 23320 23330 23340 
CACTGTCTCACGAGTGGGATAACGTATTCTGTCGTGCGGTTGATATTGCTTCGTCATTAA 

23350 23360 23370. 23380 23390 23400 
CGGCTGAACAAGTTGCAAGCCTTGTTAGTGATGAACTACTTGATGCTAACACTGTATTAA 

23410 23420 23430 23440 23450 23460 

CAGAA6TGGGTTATCAACAAGCTGGTAAAGGCCTTGAACGTATCACGTTAACTGGTGTGG 

23470 23480 23490 23500 23510 23520 

CTACTGACAGCTATGCATTAACAGCTGGCAATAACATCGATGCTAACTCGGTATTTTTAG 

23530 23540 23550 23560 23570 23580 

TGAGTGGTGGCGCAAAAGGTGTAACTGCACATTGTGTTGCTCGTATAGCTAAAGAATATC 

23590 23600 23610 23620 23630 23640 

AGTCTAAGTTCATCTTATTGGGACGTTCAACGTTCTCAAGTGACGAACCGAGCTGGGCAA 

23650 23660 23670 23680 23690 23700 

GTGGTATTACTGATGAAGCGGCGTTAAAGAAAGCAGCGATGCAGTCTTTGATTACAGCAG 

23710 23720 23730 23740 23750 23760 

GTGATAAACCAACACCCGTTAAGATCGTACAGCTAATCAAACCAATCCAAGCTAATCGTG 

23770 23780 23790 23800 23810 23820 

AAATTGCGCAAACCTTGTCTGCAATTACCGCrrGCTGGTGGCCAAGCTGAATATGTTTCTG 

23830 23840 23850 23860 23870 23880 

CAGATGTAACTAATGCAGCAAGCGTACAAATGGCAGTCGCTCCAGCTATCGCTAAGTTCG 

23890 23900 23910 23920 23930 23940 

GTGCAATCACTGGCATCATTCATGGCGCGGGTGTGTTAGCIGACCAATTCATTGAGCAAA 
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23950 


23960 


23970 


23980 


23990 


24000 


AAACACTGAGTGATTTTGAGTCTGTTTACAGCACTAAAATTGACGGTTTGTTATCGCTAC 

24010 24020 24030 24040 24050 24060 

TATCAGTCACTGAAGCAAGCAACATCAAGCAATTGGTATTGTTCTCGTCAGCGGCTGGTT 

24070 24080 24090 24100 24110 24120 

TCTACGGTAACCCCGGCCAGTCTGATTACTCGATTGCCAATGAGATCTTAAATAAAACCG 

24130 24140 24150 24160 24170 24180 

CATACCGCTTTAAATCATTGCACCCACAAGCTCAAGTATTGAGCTTTAACTGGGGTCCTT 

24190 24200 24210 24220 24230 24240 

GGGACGGTGGCATGGTAACGCCTGAGCTTAAACGTATGTTTGACCAACGTGGTGTTTACA 

24250 24260 24270 24280 24290 24300 

TTATTCCACTTGATGCAGGTGCACAGTTATTGCTGAATGAACTAGCCGCTAATGATAACC 

24310 24320 24330 24340 24350 24360 

GTTGTCCACAAATCCTCGTGGGTAATGACTTATCTAAAGATGCTAGCTCTGATCAAAAGT 

24370 24380 24390 24400 24410 24420 

CTGATGAAAAGAGTACTGCTGTAAAAAAGCCACAAGTTAGTCGTTTATCAGATGCTTTAG 

24430 24440 24450 24460 24470 24480 

TAACTAAAAGTATCAAAGCGACTAACAGTAGCTCTTTATCAAACAAGACTAGTGCTTTAT 

24490 24500 24510 24520 24530 24540 

CAGACAGTAGTGCTTTTCAGGTTAACGAAAACCACTTTTTAGCTGACCACATGATCAAAG 

24550 24560 24570 24580 24590 24600 

GCAATCAGGTATTACCAACGGTATGCGCGATTGCTTGGATGAGTGATGCAGCAAAAGC6A 

24610 24620 24630 24640 24650 24660 

CTTATAGTAACCGAGACTGTGCATTGAAGTATGTCGGTTTCGAAGACTATAAATTGTTTA 

24670 24680 24690 24700 24710 24720 

AAGGTGTGGTTTTTGATGGCAAtGAGGCGGCGGATTACCAAATCCAATTGTCGCCTGTGA 

24730 24740 24750 24760 24770 24780 

CAAGGGCGTCAGAACAGGATTCTGAAGTCCGTATTGCCGCAAAGATCTTTAGCCTGAAAA 

24790 24800 24810 24820 24830 24840 

GTGACGGTAAACCTGTGTTTCATTATGCAGCGACAATATTGTTAGCAACTCAGCCACTTA 

24850 24860 24870 24880 24890 24900 

ATGCTGTGAAGGTAGAACTTCCGACATTGACAGAAAGTGTTGATAGCAACAATAAAGTAA 

24910 24920 24930 24940 24950 24960 

CTGATGAAGCACAAGCGTTATACAGCAATGGCACCTTGTTCCACGGTGAAAGTCTGCAGG 

24970 24980 24990 25000 25010 25020 
GCATTAAGCAGATATTAAGTTGTGACGACAAGGGCCT6CTATTGGCTTGTCAGATAACCG 

25030 25040 25050 25060 25070 25080 

ATGTTGCAACAGCTAAGCAGGGATCCTTCCCQJTAGCTGACAACAATATCTTTGCCAATG 
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25090 25X00 25110 25120 25130 25140 

ATTTGGTTTATCAGGCTATGTTGGTCTGGGTGCGCAAACAATTTGGTTTAGGTAGCTTAC 

25150 25160 25170 25180 25190 25200 
CTTC6GTGACAACGGCTTGGACTGTGTATCGTGAAGTGGTTGTAGATGAAGTATTTTATC 

25210 25220 25230 25240 25250 25260 
TGCAACTTAATGTTGTTGAGCATGATCTATTGGGTTCACGCQGCAGTAAAGCCCGTTGTG 

25270 25280 25290 25300 25310 25320 

ATATTCAATTGATTGCTGCTGATATGCAATTACTTGCCGAAGTGAAATCAGCGCAAGTCA 

25330 ' 25340 25350 25360 25370 25380 
GTGTCAGTGACATTTTGAACGATATGTCATGATCGAGTAAATAATAACGATAGGCGTCAT 

25390 25400 25410 25420 25430 25440 

GGTGAGCATGGCGTCTGCTTTCTTCATTTTTTAACATTAACAATATTAATAGCTAAACGC 

25450 25460 25470 25480 25490 25500 
GGTTGCTTTAAACCAAGTAAACAAGTGCTTTTAGCTATTACTATTCCAAACAGGATATTA 

25510 25520 25530 25540 25550 25560 

AAGAGAATATGACGGTiATTAGCTGTTATTGGTATGGATGCTAAATTTAGCGGACAAGACA 

25570 25580 25590 25600 25610 25620 

ATATTGACCGTGTGGAACGCGCTTTCTATGAAGGTCCTTATGTAGGTAATGTTAGCCGCG 

25630 25640 25650 25660 25670 25680 

TTAGTACCGAATCTAATGTTATTAGCAATGGCGAAGAACAAGTTATTACTGCCATGACAG 

25690 25700 25710 25720 25730 25740 

TTCTTAACTCTGTCAGTCTACTAGCGCAAACGAATCAGTTAAATATAGCTGATATCGCGG 

25750 25760 25770 25780 25790 25800 

TGTTGCTGATTGCTGATGTAAAAAGTGCTGATGATCAGCTTGTAGTCCAAATTGCATCAG 

25810 25820 25830 25840 25850 25860 

CAATTGAAAAACAGTGTGCGAGTTGTGTTGTTATTGCTGATTTAGGCCAAGCATTAAATC 

25870 25880 25890 25900 25910 25920 

AAGTAGCTGATTTAGTTAATAACCAAGACTGTCCTGTGGCTGTAATTGGCATGAATAACT 

25930 25940 25950 25960 25970 25980 
CGGTTAATTTATCTCGTCATGATCTTGAATCTGTAACTGCAACAATCAGCTTTGATGAAA 

25990 26000 26010 26020 26030 26040 

CCTTCAATGGTTATAACAATGTAGCTGGGTTCGCGAGTTTACTTATCGCTTCAACTGCGT 

26050 26060 26070 26080 26090 26100 

TTGCCAATGCTAAGCAATGTTATATATACGCCAACATTAAGGGCTTCGCTCAATCGGGCG 

26110 26120 26130 26140 26150 26160 

TAAATGCTCAATTTAACGTTGGAAACATTAGCGATACTGCAAAGACCGCATTGCAGCAAG 

26170 26180 26190 26200 26210 26220 
CTAGCATAACTGCAGAGCAGGTTGGTTTGTT^GAAGTGTCAGCAGTCGCTGATTCGGCAA 
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26230 26240 26250 26260. 26270 26280 

TCGCATTGTCTGAAAGCCAAGGTTTAATGTCTGCTTATCATCATACGCAAACTTTGCATA 

26290 26300 26310 26320 26330 26340 
CTGCATTAAGCAGTGCCCGTAGTGTGACTGGTGAAGGCGGGTGTTTTTCACAGGTCGCAG 

26350 26360 26370 26380 26390 26400 
GTTTATTGAAATGTGTAATTGGTTTACATCAACGTTATATTCCGGCGATTAAAGATTGGC 

26410 26420 26430 26440 26450 26460 

AACAACCGAGTGACAATCAAATGTCACGGTGGCGGAATTCACCATTCTATATGCCTGTAG 

26470 26480 26490 26500 26510 26520 

ATGCTCGACCTTGGTTCCCACATGCTGATGGCTCTGCACACATTGCCGCTTATAGTTGTG 

26530 26540 26550 26560 26570 26580 

TGACTGCTGACAGCTATTGTCATATTCTTTTACAAGAAAACGTCTTACAAGAACTTGTTT 

26590 26600 26610 26620 26630 26640 
TGAAAGAAACAGTCTTGCAAGATAATGACTTAACTGAAAGCAAGCTTCAGACTCTTGAAC 

26650 26660 26670 26680 26690 26700 

AAAACAATCCAGTAGCTGATCTGCGCACTAATGGTTACTTTGCATCGAGCGAGTTAGCAT 

26710 26720 26730 26740 26750 26760 

TAATCATAGTACAAGGTAATGACGAAGCACAATTACGCTGTGAATTAGAAACTATTACAG 

26770 26780 26790 26800 26810 26820 

GGCAGTTAAGTACTACTGGCATAAGTACTATCAGTATTAAACAGATCGCAGCAGACTGTT 

26830 26840 26850 26660 26870 26880 

ATGCCCGTAATGATACTAACAAAGCCTATAGCGCAGTGCTTATTGCCGAGACTGCTGAAG 

26890 26900 26910 26920 26930 26940 

AGTTAAGCAAAGAAATAACCTTGGCGTTTGCTGGTATCGCTAGCGTGTTTAATGAAGATG 

26950 26960 26970 26980 26990 27000 

CTAAAGAATGGAAAACCCCGAAGGGCAGTTATTTTACCGCGCAGCCTGCAAATAAACAGG 

27010 27020 27030 27040 27050 27060 

CTGCTAACAGCACACAGAATGGTGTCACCTTCATGTACCCAGGTATTGGTGCTACATATG 

27070 27080 27090 27100 27110 27120 

TTGGTTTAGGGCGTGATCTATTTCATCTATTCCCACAGATTTATCAGCCTGTAGCGGCTT 

27130 27140 27150 27160 27170 27180 

TAGCCGATGACATTGGCGAAAGTCTAAAAGATACTTTACTTAATCCACGCAGTATTAGTC 

27190 27200 27210 27220 27230 27240 

GTCATAGCTTTAAAGAACTCAAGCAGTTGGATCTGGACCTGCGCGGTAACTTAGCCAATA 

27250 27260 27270 27280 27290 27300 
TCGCTGAAGCCGGTGTGGGTTTTGCTTGTGTGTTTACCAAGGTATTTGAAGAAGTCTTTG 

27310 27320 27330 27340 27350 27360 

CCGTTAAAGCTGACTTTGCTACAGGTTATAGC/^TGGGTGAAGTAAGCATGTATGCAGCAC 
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27370 27380 27390 27400 27410 27420 

TAGGCTGCTGGCAGCAACCGGGATTGATGAGTGCTCGCCTTGCACAATCGAATACCTTTA 

27430 27440 27450 27460 27470 27480 

ATCATCAACTTTGCGGCGAGTTAAGAACACTACGTCAGCATTGGGGCATGGATGATGTAG 

27490 27500 27510 27520 27530 27540 

CTAACGGTACGTTCGAGCAGATCTGGGAAACCTATACCATTAAGGCAACGATTGAACAGG 

27550 27560 27570 27580 27590 27600 

TCGAAATTGCCTCTGCAGATGAAGATCGTGTGTATTGCACCATTATCAATACACCTGATA 

27610 27620 27630 27640 27650 27660 

GCTTGTTGTTAGCCGGTTATCCAGAAGCCTGTCAGCGAGTCATTAAGAATTTAGGTGTGC 

27670 27680 27690 27700 27710 27720 

GTGCAATGGCATTGAATATGGCGAACGCAATTCACAGCGCGCCAGCTTATGCCGAATACG 

27730 27740 27750 27760 27770 27780 

ATCATATGGTTGAGCTATACCATATGGATGTTACTCCACGTATTAATACCAAGATGTATT 

27790 27800 27810 27820 27830 27840 

CAAGCTCATGTTATTTACCGATTCCACAACGCAGCAAAGCGATTTCCCACAGTATTGCTA 

27850 27860 27870 27880 27890 27900 

AATGTTTGTGTGATGTGGTGGATTTCCCACGTTTGGTTAATACCTTACATGACAAAGGTG 

27910 27920 27930 27940 27950 27960 

CGCGGGTATTCATTGAAATGGGTCCAGGTCGTTCGTTATGTAGCTGGGTAGATAAGATCT 

27970 27980 27990 28000 28010 28020 

TAGTTAATGGCGATGGCGATAATAAAAAGCAAAGCCAACATGTATCTGTTCCTGTGAATG 

2B030 28040 28050 28060 28070 28080 

CCAAAGGCACCAGTGATGAACTTACTTATATTCGTGCGATTGCTAAGTTAATTAGTCATG 

28090 28100 28110 28120 28130 28140 

GCGTGAATTTGAATTTAGATAGCTTGTTTAACGGGTCAATCCTGGTTAAAGCAGGCCATA 

28150 28160 28170 28180 28190 28200 

TAGCAAACACGAACAAATAGTCAACATCGATATCTAGCGCTGGTGAGTTATACCTCATTA 

28210 28220 28230 28240 28250 28260 

GTTGAAATATGGATTTAAAGAGAGTAATTATGGAAAATATTGCAGTAGTAGGTATTGCTA 

28270 28280 28290 28300 28310 28320 

ATTTGTTCCCGGGCTCACAAGCACCGGATCAATTTTGGCAGCAATTGCTTGAACAACAAG 

28330 28340 28350 28360 28370 28380 

ATTGCCGCAGTAAGGCGACCGCTGTTCAAATGGGCGTTGATCCTGCTAAATATACCGCCA 

28390 28400 28410 28420 28430 26440 

ACAAAGGTGACACAGATAAATTTTACTGTGTGCACGGCGGTTACATCAGTGATTTCAATT 

28450 28460 28470 28480 28490 28500 

TTGATGCTTCAGGTTATCAACTCGATAATGAXTATTTAGCCGGTTTAGATGACCrPTAATC 
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28510 28520 28530 28540 28550 28560 

AATGGGGGCTTTATGTTACGAAACAAGCCCTTACCGATGCGGGTTATTGGGGCAGTACTG 

26570 26580 28590 28600 28610 28620 

CACTAGAAAACTGTGGTGTGATTTTAGGTAATTTGTCATTCCCAACTAAATCATCTAATC 

28630 28640 28650 28660 28670 28680 

AGCTGTTTATGCCTTTGTATCATCAAGTTGTTGATAATGCCTTAAAGGCGGTATTACATC 

28690 28700 28710 28720 28730 28740 
CTGATTTTCAATTAACGCATTACACAGCACCGAAAAAAACACATGCTGACAATGCATTAG 

28750 28760 28770 28780 28790 28800 

TAGCAGGTTATCCAGCTGCATTGATCGCGCAAGCGGCGGGTCTTGGTGGTTCACATTTTG 

28810 28820 28830 28840 28850 28860 

CACTGGATGCGGCTTGTGCTTCATCTTGTTATAGCGTTAAGTTAGCGTGTGATTACCTGC 

28870 28880 28890 28900 28910 28920 

ATACGGGTAAAGCCAACATGATGCTTGCTGGTGCGGTATCTGCAGCAGATCCTATGTTCG 

28930 28940 28950 26960 28970 28980 
TAAATATGGGTTTCTCGATATTCCAAGCTTACCCAGCTAACAATGTACATGCCCCGTTTG 

28990 29000 29010 2902O 29030 29040 

ACCAAAATTCACAAGGTCTATTT6CCGGTGAAGGCGCGGGCATGATGGTATTGAAACGTC 

29050 29060 29070 29080 29090 29100 

AAAGTGATGCAGTACGTGATGGTGATCATATTTACGCCATTATTAAAGGCGGCGCATTAT 

29110 29120 29130 29140 29150 29160 

CGAATGACGGTAAAGGCGAGTTTGTATTAAGCCCGAACACCAAGGGCCAAGTATTAGTAT 

29170 29180 29190 29200 29210 29220 
ATGAACGTGCTTATGCCGATGCAGATGTTGACCCGAGTACAGTTGACTATATTGAATGTC 

29230 29240 29250 29260 29270 29280 

ATGCAACGGGCACACCTAAGGGTGACAATGTTGAATTGCGTTCGATGGAAACCTTTTTCA 

29290 29300 29310 29320 29330 29340 
GTCGCGTAAATAACAAACCATTACTGGGCTCGGTTAAATCTAACCTTGGTCATTTGTTAA 

29350 29360 29370 29380 29390 29400 

CTGCCGCTGGTATGCCTGGCATGACCAAAGCTATGTTAGCGCTAGGTAAAGGTCTTATTC 

29410 29420 29430 29440 29450 29460 

CTGCAACGATTAACTTAAAGCAACCACTGCAATCTAAAAACGGTTACTTTACTGGCGAGC 

29470 29480 29490 29500 29510 29520 

AAATGCCAACGACGACTGTGTCTTGGCCAACAACTCCGGGTGCCAAGGCAGATAAACCGC 

29530 29540 29550 29560 29570 29580 

GTACCGCAGGTGTGAGCGTATTTGGTTTTGGTGGCAGCAACGCCCATTTGGTATTACAAC 

29590 29600 29610 29620 29630 29640 
AGCCAACGCAAACACTCGAGACTAATTTTAGTGTTGCTAAACCACGTGAGCCTTTGGCTA 
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29650 29660 29670 29680 29690 29700 

TTATTGGTATGGACAGCCATTTTGGTAGTGCCAGTAATTTAGCGCAGTTCAAAACCTTAT 

29710 29720 29730 29740 29750 29760 

TAAATAATAATCAAAATACCTTCCGTGAATTACCAGAACAACGCTGGAAAGGCATGGAAA 

29770 29780 29790 29800 29810 29820 
GTAACGCTAACGTCATGCAGTCGTTACAATTACGCAAAGCGCCTAAAGGCAGTTACGTTG 

29830 29840 29850 29660 29870 29880 

AACAGCTAGATATTGATTTCTTGCGTTTTAAAGTACCGCCTAATGAAAAAGATTGCTTGA 

29890 29900 29910 29920 29930 29940 

TCCCGCAACAGTTAATGATGATGCAAGTGGCAGACAATGCTGCGAAAGACGGAGGTCTAG 

29950 29960 29970 29980 29990 30000 

TTGAAGGTCGTAATGTTGCGGTATTAGTAGCGATGGGCATGGAACTGGAATTACATCAGT 

30010 30020 30030 30040 30050 30060 

ATCGTGGTCGCGTTAATCTAACCACCCAAATTGAAGACAGCTTATTACAGCAAGGTATTA 

30070 30080 30090 30100 30110 30120 

ACCTGACTGTTGAGCAACGTGAAGAACTGACCAATATTGCTAAAGACGGTGTTGCCTCGG 

30130 30140 30150 30160 30170 30180 

CTGCACAGCTAAATCAGTATACGAGTTTCATTGGTAATATTATGGCGTCACGTATTTCGG 

30190 30200 30210 30220 30230 30240 

CGTTATGGGATTTTTCTGGTCCTGCTATTACCGTATCGGCTGAAGAAAACTCTGTTTATC 

30250 30260 30270 30280 30290 30300 
GTTGTGTTGAATTAGCTGAAAATCTATTTCAAACCAGTGATGTTGAAGCCGTTATTATTG 

30310 30320 30330 30340 30350 30360 

CTGCTGTTGATTTGTCTGGTTCAATTGAAAACATTACTTTACGTCAGCACTACGGTCCAG 

30370 30380 30390 30400 30410 30420 

TTAATGAAAAGGGATCTGTAAGTGAATGTGGTCCGGTTAATGAAAGCAGTTCAGTAACCA 

30430 30440 30450 30460 30470 30480 

ACAATATTCTTGATCAGCAACAATGGCTGGTGGGTGAAGGCGCAGCGGCTATTGTCGTTA 

30490 30500 30510 30520 30530 30540 

AACCGTCATCGCAAGTCACTGCTGAGCAAGTTTATGCGCGTATTGATGCGGTGAGTTTTG 

30550 30560 30570 30580 30590 30600 

CCCCTGGTAGCAATGCGAAAGCAATTACGATTGCAGCGGATAAAGCATTAACACTTGCTG 

30610 30620 30630 30640 30650 30660 

GTATCAGTGCTGCTGATGTAGCTAGTGTTGAAGCACATGCAAGTGGTTTTAGTGCCGAAA 

30670 30680 30690 30700 30710 30720 

ATAATGCTGAAAAAACCGCGTTACCGACTTTATACCCAAGCGCAAGTATCAGTTCGGTGA 

30730 30740 30750 30760 30770 30780 

AAGCCAATATTGGTCATACGTTTAATGCCTCqpGTATGGCGAGTATTATTAAAACGGCGC 
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30790 30800 30810 30820 30830 30840 

TGCTGTTAGATCAGAATACGAGTCAAGATCAGAAAAGCAAACATATTGCTATTTUVCGGTC 

30850 30860 30870 30880 30890 30900 

TAGGTCGTGATAACAGCTGCGCGCATCTTATCTTATCGAGTTCAGCGCAAGCGCATCAAG 

30910 30920 30930 30940 30950 30960 

TTGCACCAGCGCCTGTATCTGGTATGGCCAAGCAACGCCCACAGTTAGTTAAAACCATCA 

30970 30960 30990 31000 31010 31020 

AACTCGGTGGTCAGTTAATTAGCAACGCGATTGTTAACAGTGCGAGTTCATCTTTACACG 

31030 31040 31050 31060 31070 31080 

CTATTAAAGCGCAGTTTGCCGGTAAGCACTTAAACAAAGTTAACCAGCCAGTGATGATGG 

31090 31100 31110 31120 31130 31140 

ATAACCTGAAGCCCCAAGGTATTAGCGCTCATGCAACCAATGAGTATGTGGTGACTGGAG 

31150 31160 31170 31180 31190 31200 

CTGCTAACACTCAAGCTTCTAACATTCAAGCATCTCATGTTCAAGCGTCAAGTCATGCAC 

31210 31220 31230 31240 31250 31260 

AAGAGATAGCACCAAACCAAGTTCAAAATATGCAAGCTACAGCAGCCGCTGTAAGTTCAC 

31270 31280 31290 31300 31310 31320 

CCCTTTCTCAACATCAACACACAGCGCAGCCCGTAGCGGCACCGAGCGTTGTTGGAGTGA 

31330 31340 31350 31360 31370 31380 

CTGTGAAACATAAAGCAAGTAACCAAATTCATCAGCAAGCGTCTACGCATAAAGCATTTT 

31390 31400 31410 31420 31430 31440 

TAGAAAGTCGTTTAGCTGCACAGAAAAACCTATCGCAACTTGTTGAATTGCAAACCAAGC 

31450 31460 31470 31480 31490 31500 

TGTCAATCCAAACTGGTAGTGACAATACATCTAACAATACTGCGTCAACAAGCAATACAG 

31510 31520 31530 31540 31550 31560 

TGCTAACAAATCCTGTATCAGCAACGCCATTAACACTTGTGTCTAATGCGCCTGTAGTAG 

31570 31580 31590 31600 31610 31620 

CGACAAACCTAACCAGTACAGAAGCAAAAGCGCAAGCAGCTGCTACACAAGCTGGTTTTC 

31630 31640 31650 31660 31670 31680 

AGATAAAAGGACCTGTTGGTTACAACTATCCACCGCTGCAGTTAATTGAACGTTATAATA 

31690 31700 31710 31720 31730 31740 

AACCAGAAAACGTGATTTACGATCAAGCTGATTTGGTTGAATTCGCTGAAGGTGATATTG 

31750 31760 31770 31780 31790 31800 

GTAAGGTATTTGGTGCTGAATACAATATTATTGATGGCTATTCGCGTCGTGTACGTCTGC 

31810 31820 31830 31840 31850 31860 
CAACCTCAGATTACTTGTTAGTAACACGTGTTACTGAACTTGATGCCAAGGTGCATGAAT 

31870 31880 31890 31900 31910 31920 
ACAAGAAATCATACATGTGTACTGAATATGATGTGCCTGTTGATGCACCGTTCTTAATTG 
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31930 31940 31950 31960 31970 31980 
ATGGTCAGATCCCTTGGTCTGTTGCCGTCGAATCAGGCCAGTGTGATTTGATGTTGATTT 

31990 32000 32010 32020 32030 32040 

CATATATCGGTATTGATTTCCAAGCGAAAGGCGAACGTGTTTACCGTTTACTTGATTGTG 

32050 32060 32070 320B0 32090 32100 

AATTAACTTTCCTTGAAGAGATGGCTTTTGGTGGCGATACTTTACGTTACGAGATCCACA 

32110 32120 32130 32140 32150 32160 

TTGATTCGTATGCACGTAACGGCGAGCAATTATTATTCTTCTTCCATTACGATTGTTACG 

32170 32180 32190 32200 32210 32220 

TAGGGGATAAGAAGGTACTTATCATGCGTAATGGTTGTGCTGGTTTCTTTACTGACGAAG 

32230 32240 32250 32260 32270 32280 

AACTTTCTGATGGTAAAGGCGTTATTCATAACGACAAAGACAAAGCTGAGTTTAGCAATG 

32290 32300 32310 32320 32330 32340 

CTGTTAAATCATCATTCACGCCGTTATTACAACATAACCGTGGTCAATACGATTATAACG 

32350 32360 32370 32380 32390 32400 

ACATGATGAAGTTGGTTAATGGTGATGTTGCCAGTTGTTTTGGTCCGCAATATGATCAAG 

32410 32420 32430 32440 32450 32460 

GTGGCCGTAATCCATCATTGAAATTCTCGTCTGAGAAGTTCTTGATGATTGAACGTATTA 

32470 32480 32490 32500 32510 32520 

CCAAGATAGACCCAACCGGTGGTCATTGGGGACTAGGCCTGTTAGAAGGTCAGAAAGATT 

32530 32540 32550 32560 32570 32580 

TAGACCCTGAGCATTGGTATTTCCCTTGTCACTTTAAAGGTGATCAAGTAATGGCTGGTT 

32590 32600 32610 32620 32630 32640 

CGTTGATGTCGGAAGGTTGTGGCCAAATGGCGATGTTCTTCATGCTGTCTCTTGGTATGC 

32650 32660 32670 32680 32690 32700 

ATACCAATGTGAACAACGCTCGTTTCCAACCACTACCAGGTGAATCACAAACGGTACGTT 

32710 32720 32730 32740 32750 32760 

GTCGTGGGCAAGTACTGCCACAGCGCAATACCTTAACTTACCGTATGGAAGTTACTGCGA 

32770 32780 32790 32800 32810 32820 
TGGGTATGCATCCACAGCCATTCATGAAAGCTAATATTGATATTTTGCTTGACGGTAAAG 

32830 32840 32850 32860 32870 32880 

TGGTTGTTGATTTCAAAAACTTGAGCGTGATGATCAGCGAACAAGATGAGCATTCAGATT 

32890 32900 32910 32920 32930 32940 

ACCCTGTAACACTGCCGAGTAATGTGGCGCTTAAAGCGATTACTGCACCTGTTGCGTCAG 

32950 32960 32970 32980 32990 33000 
TAGCACCAGCATCTTCACCCGCTAACAGCGCGGATCTAGACGAACGTGGTGTTGAACCGT 

33010 33020 33030 33040 33050 33060 

TTAAGTTTCCTGAACGTCCGTTAATGCGTGTTPAGTCAGACTTGTCTGCACCGAAAAGCA 
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33070 


33060 


33090 


33100 


33110 


33120 


AAGGTGTGACACCGATTAAGCATTTTGAAGCGCCTGCTGTTGCTGGTCATCATAGAGTGC 

33130 33140 33150 33160 33170 33180 

CTAACCAAGCACCGTTTACACCTTGGCATATGTTTGAGTTTGCGACGGGTAATATTTCTA 

33190 33200 33210 33220 33230 33240 
ACTGTTTCGGTCCTGATTTTGATGTTTATGAAGGTCGTATTCCACCTCGTACACCTrrGTG 

33250 33260 33270 33280 33290 33300 

GCGATTTACAAGTTGTTACTCAGGTTGTAGAAGTGCAGGGCGAACGTCTTGATCTTAAAA 

33310 33320 33330 33340 33350 33360 
ATCCATCAAGCTGTGTAGCTGAATACTATGTACCGGAAGACGCTTGGTACTTTACTAAAA 

33370 33380 33390 33400 33410 33420 

ACAGCCATGAAAACTGGATGCCTTATTCATTAATCATGGAAATTGCATTGCAACCAAATG 

33430 33440 33450 33460 33470 33480 
GCTTTATTTCTGGTTACATGGGCACGACGCTTAAATACCCTGAAAAAGATCTGTTCTTCC 

33490 33500 33510 33520 33530 33540 

GTAACCTTGATGGTAGCGGCACGTTATTAAAGCAGATTGATTTACGCGGCAAGACCATTG 

33550 33560 33570 33580 33590 33600 

TGAATAAATCAGTCTTGGTTAGTACGGCTATTGCTGGTGGCGCGATTATTCAAAGTTTCA 

33610 33620 33630 33640 33650 33660 

CGTTTGATATGTCTGTAGATGGCGAGCTATTTTATACTGGTAAAGCTGTATTOJGGTTACT 

33670 33680 33690 33700 33710 33720 

TTAGTGGTGAATCACTGACTAACC2U\CTGGGCATTGATAACGGTAAAACGACTAATGCGT 

33730 33740 33750 33760 33770 33780 

GGTTTGTTGATAACAATACCCCCGCAGCGAATATTGATGTGTTTGATTTAACTAATCAGT 

33790 33800 33810 33820 33830 33840 

CATTGGCTCTGTATAAAGCGCCTGTGGATAAACCGCATTATAAATTGGCTGGTGGTCAGA 

33850 33860 33870 33880 33890 33900 

TGAACTTTATCGATACAGTGTCAGTGGTTGAAGGCGGTGGTAAAGCGGGCGTGGCTTATG 

33910 33920 33930 33940 33950 33960 
TTTATGGCGAACGTACGATTGATGCTGATGATTGGTTCTTCCGTTATCACTTCCACCAAG 

33970 33980 33990 34000 34010 34020 

ATCCGGTGATGCCAGGTTCATTAGGTGTTGAAGCTATTATTGAGTTGATGCAGACCTATG 

34030 34040 34050 34060 34070 34080 

CGCTTAAAAATGATTTGGGTGGCAAGTTTGCTAACCCACGTTTCATTGCGCCGATGACGC 

34090 34100 34110 34120 34130 34140 
AAGTTGATTGGAAATACCGTGGGCAAATTACGCCGCTGAATAAACAGATGTCACTGGACG 

34150 34160 34170 34180 34190 34200 

TGCATATCACTGAGATCGTGAATGACGCTGG15AAGTGCGAATCGTTGGTGATGCGAATC 



wo 98/55625 


65 / 106 


PCT/US98/11639 


34210 34220 34230 34240 34250 34260 

TGTCTiUUVGATGGTCTGCGTATTTATGAAGTTAAAAACATCGTTTTAAGTATTGTTGAAG 

34270 34280 34290 34300 34310 34320 

CGTAAAGGGTCAAGTGTAACGTGCTTAAGCGCCGCATTGGTTAAAGACGCTTTGCACGCC 

34330 34340 34350 34360 34370 34380 
GTGAATCCGTCCATGGAGGCTTGGGGTTGGCATCCATGCCAACAACAGCAAGCTTACTTT 

34390 34400 34410 34420 34430 34440 

AATCAATACGGCTTGGTGTCCATTTAGACGCCTCGAACTTAGTAGTTAATAGACAAAATA 

34450 34460 34470 34480 34490 34500 

ATTTAGCTGTGGAATGAATATAGTAAGTAATCATTCGGCAGCTACAAAAAAGGAATTAAG 

34510 34520 34530 34540 34550 34560 

AATGTCGAGTTTAGGTTTTAACAATAACAACGCAATTAACTGGGCTTGGAAAGTAGATCC 

34570 34580 34590 34600 34610 34620 

AGCGTCAGTTCATACACAAGATGCAGAAATTAAAGCAGCTTTAATGGATCTAACTAAACC 

34630 34640 34650 34660 34670 34680 

TCTCTATGTGGCGAATAATTCAGGCGTAACTGGTATAGCTAATCATACGTCAGTAGCAGG 

34690 34700 34710 34720 34730 34740 
TGCGATCAGCAATAACATCGATGTTGATGTATTGGCGTTTGCGCAAAAGTTAAACCCAGA 

34750 34760 34770 34780 34790 34800 

AGATCTGGGTGATGATGCTTACAAGAAACAGCACGGCGTTAAATATGCTTATCATGGCGG 

34810 34820 34830 34840 34850 34860 

TGCGATGGCAAATGGTATTGCCTCGGTTGAATTGGTTGTTGCGTTAGGTAAAGCAGGGCT 

34870 34880 34890 34900 34910 34920 

GTTATGTTCATTTGGTGCTGCAGGTCTAGTGCCTGATGCGGTTGAAGATGCAATTCGTCG 

34930 34940 34950 34960 34970 34980 

TATTCAAGCTGAATTACCAAATGGCCCTTATGCGGTTAACTTGATCCATGCACCAGCAGA 

34990 35000 35010 35020 35030 35040 

AGAAGCATTAGAGCGTGGCGCGGTTGAACGTTTCCTAAAACTTGGCGTCAAGACGGTAGA 

35050 35060 35070 35080 35090 35100 

GGCTTCAGCTTACCTTGGTTTAACTGAACACATTGTTTGGTATCGTGCTGCTGGTCTAAC 

35110 35120 35130 35140 35150 35160 

TAAAAACGCAGATGGCAGTGTTAATATCGGTAACAAGGTTATCGCTAAAGTATCGCGTAC 

35170 35180 35190 35200 35210 35220 
CGAAGTTGGTCGCCGCTTTATGGAACCTGCACCGCAAAAATTACTGGATAAGTTATTAGA 

35230 35240 35250 35260 35270 35280 
ACAAAATAAGATCACCCCTGAACAAGCTGCTTTAGCGTTGCTTGTACCTATGGCTGATGA 

35290 35300 35310 35320 35330 35340 

TATTACTGGGGAAGCGGATTCTGGTGGTCATACAGATAACCGTCCGTTTTTAACATTATT 


wo 98/55625 


66 


/ 


106 


PCT/US98/11639 


35350 35360 35370 35380 35390 35400 

ACCGACGATTATTGGICTGCGTGATGAAGTGCAAGCGAAGTATAACTTCTCTCCTGCATT 

35410 35420 35430 35440 35450 35460 

ACGTGTTGGTGCTGGTGGTGGTATCGGAACGCCTGAAGCAGCACTCGCTGCATTTAACAT 

35470 35480 35490 35500 35510 35520 
GGGCGCGGCTTATATCGTTCTGGGTTCTGTGAATCAGGCGTGTGTTGAAGCGGGTGCATC 

35530 35540 35550 35560 35570 35580 

TGAATATACTCGTAAACTGTTATCGACAGTTGAAATGGCTGATGTGACTATGGCACCTGC 

35590 35600 35610 35620 35630 35640 

TGCAGATATGTTTGAAATGGGTGTGAAGCTGCAAGTATTAAAACGCGGTTCTATGTTCGC 

35650 35660 35670 35680 35690 35700 

GATGCGTGCGAAGAAACTGTATGACTTGTATGTGGCTTATGACTCGATTGAAGATATCCC 

35710 35720 35730 35740 35750 35760 

AGCTGCTGAACGTGAGAAGATTGAAAAACAAATCTTCCGTGCAAACCTAGACGAGATTTG 

35770 35780 35790 35800 35810 35820 

GGATGGCACTATCGCTTTCTTTACTGAACGCGATCCAGAAATGCTAGCCCGTGCAACGAG 

35830 35840 35850 35860 35870 35880 

TAGTCCTAAACGTAAAATGGCACTTATCTTCCGTTGGTATCTTGGCCTTTCTTCACGCTG 

35890 35900 35910 35920 35930 35940 

GTCAAACACAGGCGAGAAGGGACGTGAAATGGATTATCAGATTTGGGCAGGCCCAAGTTT 

35950 35960 35970 35980 35990 36000 

AGGTGCATTCAACAGCTGGGTGAAAGGTTCTTACCTTGAAGACTATACCCGCCGTGGCGC 

36010 36020 36030 36040 36050 36060 

TGTAGATGTTGCTTTGCATATGCTTAAAGGTGCTGCGTATTTACAACGTGTAAACCAGTT 

36070 36080 36090 36100 36110 36120 

GAAATTGCAAGGTGTTAGCTTAAGTACAGAATTGGCAAGTTATCGTACGAGTGATTAATG 

36130 36140 36150 36160 36170 36180 

TTACTTGATGATATGTGAATTAATTAAAGCGCCTGAGGGCGCTTTTTTTGGTTTTTAACT 

36190 36200 36210 36220 36230 36240 

CAGGTGTTGTAACTCGAAATTGCCCCTTTCAAGTTAGATCGATTACTCACTCACAATATG 

36250 36260 36270 36280 36290 36300 

TTGATATCGCACTTGCCATATACTTGCTCATCCAAAGCCCTATATTGATAATGGTGTTAA 

36310 36320 36330 36340 36350 36360 

TAGTCTTTAATATCCGAGTCTTTCTTCAGCATAATACTAATATAGAGACTCGACCAATGT 

36370 36380 36390 36400 36410 36420 

TAAACACAACAAAGAATATATTCTTGTGTACTGCCTTATTATTAACGAGTGCGAGTACGA 

36430 36440 36450 36460 36470 36480 

CAGCTACTACGCTAAACAATTCGATATCAGCARTTGAACAACGTATTTCTGGTCGTATCG 
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36490 


36500 


36510 


36520 


36530 


36540 


GTGTGGCTGTTTTAGATACGCAAAATAAACAAACGTGGGCTTACAATGGTGATGCACATT 

36550 36560 36570 36580 36590 36600 

TTCCGATGATGAGTACATTCAAAACCCTCGCTTGCGCGAAAATGCTAAGTGAATCGACAA 

36610 36620 36630 36640 36650 36660 

ATGGTAATCTGGATCCCAGTACTAGCTCATTGATAAAGGCTGAAGAATTAATCCCTTGGT 

36670 36680 36690 36700 36710 36720 
CACCAGTCACTAAAACGTTTGTGAATAACACTAXTACAGTGGCGAAAGCGTGTGAAGCAA 

36730 36740 36750 36760 36770 36780 

CAATGCTGACCAGTGATAATACCGCGGCTAATATTGTTTTACAGTATATCGGAGGCCCTC 

36790 36800 36810 36820 36830 36840 

AAGGCGTTACTGCATTCTTGCGAGAAATTGGTGATGAAGAGAGTCAGTTAGATCGTATAG 

36850 36860 36870 36880 36890 36900 

AACCTGAATTGAATGAAGCTAAGGTCGGAGACTTGCGTGATACCACGACACCGAAAGCCA 

36910 36920 36930 36940 36950 36960 

TAGTTACCACGCTCAACAAACTACTACTTGGTGATGTTCTACTTGATTTGGATAAAAACC 

36970 36980 36990 37000 37010 '37020 

AACTTAAAACATGGATGCAAAATAATAAAGTGTCAGATCCTTTACTGCGTTCTATATTAC 

37030 37040 37050 37060 37070 37080 

CGCAAGGCTGGTTTATTGCCGACCGCTCAGGTGCGGGTGGTAATGGTTCTCGAGGTATAA 

37090 37100 37110 37120 37130 37140 

CTGCTATGCTTTGGCACTCCGAGCGTCAACCGCTAATCATCAGTATTTATTTAACCGAAA 

37150 37160 37170 37180 37190 37200 

CTGAGTTAGCAATGGCAATGCGCAATGAGATTATTGTTGAGATCGGTAAGCTGATATTCA 

37210 37220 37230 37240 37250 37260 

AAGAATACGCGGTGAAATAATAAGTTATTTTTTGATAATACTTTAACGAGCGTAGCTATC 

37270 37280 37290 37300 37310 37320 

GAAGTGAGGGCGTCAATTAGACACCTTTGCTTCCCCTACAAAATCTAATGTGTATTACCT 

37330 37340 37350 37360 37370 37380 

CGGCTAGTACAATTGCCCTAAGTTATTTCTGTCCAGCTTTGGCTTAGTGCAATTGCGTTA 

37390 37400 37410 37420 37430 37440 

GCCAATGTGAACACCAAGGGACTTTGTCGTACCATAACTACCAAGCGACTTTGTCGTTTT 

37450 37460 37470 37480 37490 37500 

TATCTTTTCTTAGACAAACAGAGGTTAAATGAGTGACGCCTTCCAAATCACAGGAATGAA 

37510 37520 37530 37540 37550 37560 
TCCGCATTTCAATAAAATCTAACCCGTACCAACTCCGTACAAGTTGATCTTTAGTTGTTT 

37570 37580 37590 37600 37610 37620 

AAAATCTATAATAAATTCAATTACGGAATTAATCCGTACAACTGGAGGTTTTATGGCTAC 
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37630 37640 37650 37660 37670 37680 

TGCAAGACTTGATATCCGTTTGGATGAAGAAATCAAAGCTAAGGCTGAGAAAGCATCAGC 

37690 37700 37710 37720 37730 37740 

TTTACTCGGCTTAAAAAGTTTAACCGAATACGTTGTTCGCTTAATGGACGAAGATTCAAC 

37750 37760 37770 37780 37790 37800 

TAAAGTAGTTTCTGAGCATGAGAGTATTACCGTTGAAGCGAATGTATTCGACCAATTTAT 

37810 37820 37830 37840 37850 37860 

GGCTGCTTGTGATGAAGCGAAAGCCCCAAATAAAGCATTACTTGAAGCCGCTGTATTTAC 

37870 37B80 37890 37900 37910 37920 

TCAGAATGGTGAGTTTAAGTGAGTTATTCCAAACGTTTCAAAGAACTGGATAAATCAAAA 

37930 37940 37950 37960 37970 37980 

CATGACAGAGCATCATTTGACTGTGGCGAAAAAGAGCTAAATGATTTTATCCAAACTCAA 

37990 38000 38010 38020 36030 38040 

GCAGCCAAACATATGCAAGCAG6TATTAGCCGCACTCTGGTTTTACCTGCTTCTGCGCCG 

38050 38060 38070 38080 38090 38100 

TTACCAAACAAAAAATATCCAATTTGCTCATTTTATAGTATCGCGCCAAGCTCAATTAGC 

38110 38120 38130 38140 38150 38160 

CGCGATACGTTACCACAAGCAATGGCTAAAAAGTTACCACGTTATCCTATCCCTGTTTTT 

38170 38180 38190 38200 38210 38220 

CTTTTGGCTCAACTTGCCGTCCATAAAGAGTTTCATGGGAGTGGGTTAGGCAAAGTTAGC 

38230 38240 38250 38260 38270 38280 

TTAATTAAAGCGTTAGAGTACCTTTGGGAAATTAACTCTCACATGAGAGCTTACGCCATC 

38290 38300 38310 38320 38330 38340 

GTTGTTGATTGTTTAACTGAACAAGCTGAGTCATTCTACGCTAAATATGGTTTCGACGTT 

38350 38360 38370 38380 38390 38400 

CTCTGCGAAATAAATGGTCGAGTAAGAATGTTCATATCAATGAAAACAGTCAATCAGTTA 

38410 38420 38430 38440 38450 38460 

TTCACTTAACAGTAAGAGTTAGTATAACAGTTGTATGAATTAAATTTATTATATTCGGTA 

38470 38480 38490 38500 38510 38520 

ATCTCATTGCGATCACGCTAGAAGTGCGAGCGGGTCAGACCGAGGCCACAATAGCAGCCG 

38530 38540 38550 38560 38570 38580 

TTACGTTTAGGGGATGACTTAAAAAGATAACTACTACGTCAGTGGCGATCCTAGAGGATT 

38590 38600 38610 38620 38630 38640 
AAAGGTTTATGATTCACAACATTTATTTATTGTGCTTAATTTTTTCTATCCAATATGCGC 

38650 38660 38670 38680 38690 38700 

AAGCTGTAAATATCACTGAAGTAGACTTTTATGTCAGTGATGATATCCCTAAAGATGTTG 

38710 38720 38730 38740 38750 38760 

CCAAATTAAAGATAGGTGAATCCATAACGAACTCCAGCCTTATTCTAAGTAACTCATCTA 
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38770 


38780 


38790 


38800 


38610 


38820 


TTCCACTCTCGCGGGAGACGGGTAACATATATTACTCTTCATCAATTGCTAACTTGAACT 

38830 38840 38850 38860 38870 38880 
ATGACTCGATAGAATTTGTTATGGCTCAATTGATGGCCGAAGATTCCAGCCTTTACAAGA 

38890 38900 38910 38920 38930 38940 
TGCTGGTAAATAGCGATAGGTTGTCCGTGCTAGTAATGACATCTTCCCAGTCCACAGATC 

38950 38960 38970 38980 38990 39000 

TCTATGGCTCGACTTACTCGGCTTATTTTCCTAATGTTGCGGTCATCGATTTGAATTGTG 

39010 39020 39030 39040 39050 39060 

ACTCGCTAACTTTAGAACATGAGCTCGGCCATCTATACGGAGCTGAACATGAAGAAATAT 

39070 39080 39090 39100 39110 39120 

ATGACGACTATGTCTTCTATGCTGCGATATGTGGAGACTATACGACTATCATGAACTCTA 

39130 39140 39150 39160 39170 39180 

TGCAGCCTGAAATGAAAGAAAAACAAATGATAAAGGCATATTCATTCCCTGAATTAAAAG 

39190 39200 39210 39220 39230 39240 

TGGATGGCTTGCAGTGCGGAAATGAAAATACGAATAACAAAAAGGTTATTTTAGACAATA 

39250 39260 39270 39280 39290 39300 

TTGGTCGGTTTAGATAGGATTGGGATATTATTCTCATTCGGCTCTACTTAGTGCTGTTAT 

39310 39320 39330 39340 39350 39360 

TATGAGTGCCAGTGCTTCTATCTACGATATTGGTCTTAACAAGTATTTATCTATAGACGC 

39370 39380 39390 39400 39410 39420 

TAAGGTGTTATGTATTTAAGGGATGTTCAAGATGAAACTAGGTGTAAACGATGTATAGTT 

39430 39440 39450 39460 39470 39480 

GTATAACATTTTTTCAACGGTTGGAACGTTCGATTCTATCGGGTAACAAGACCGCGACGA 

39490 39500 39510 39520 39530 39540 

TCCGCGATAAGTCCGATAGTCATTACTTAGTTGGTCAGATGTTAGATGCTTGTACTCACG 

39550 39560 39570 39580 39590 39600 

AAGATAATCGGAAAATGTGTCAAATAGAAATACTGAGCATTGAATATGTGACGTTTAGTG 

39610 39620 39630 39640 39650 39660 

AATTAAACCGTGCGCACGCCAATGCTGAAGGTTTACCGTTTTTGTTTATGCTTAAGTGGA 

39670 39680 39690 39700 39710 39720 

TAGTTCGAAAGATTTATCCGACTTCAAATGATTTATTTTTCATAAGTTTCAGAGTTGTAA 

39730 39740 39750 39760 39770 39780 

CTATCGATATCTTATAAGTCTTAGTGCACAAAACAGAACTATTTATAGCGCTCAAGAAGG 

39790 39800 39810 39820 39830 39840 
CGATAATTTGATAATGAATTATCGCCTTGTTACTATTAAGAGACTTTAAATGACTGAGAT 

39850 39860 39870 39880 39890 39900 
ATAAGATATGACACGGAAGAACATATTGATCACAGGCGCAAGTTCAGGGTTGGGCCGAGG 
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39910 39920 39930 39940 39950 39960 

TATGGCCATCGAATTTGCAAAATCAGGTCATAACTTAGCACTTTGTGCACGTAGACTTGA 

39970 39980 39990 40000 40010 40020 

TAATTTAGTTGCACTGAAAGCAGAACTCTTAGCCCTCAATCCTCACATCCAAATCGAAAT 

40030 40040 40050 40060 40070 40080 

AAAACCTCTTGATGTCAATGAACATGAACAAGTCTTCACTGTTrTTCCATGAATTCAAAGC 

40090 40100 40110 40120 40130 

TGAATTTGGTACGCTTGATCGTATTATTGTTAATGCTGGATTAGGCAAGGGTGGATCC 
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10 20 30 40 50 60 

AAATGCAATTAATTATGGCGTAAATAGAGTGAAAACATGGCTAATATTCACTAAGTCCTG 

70 80 90 100 110 120 

AATTTTATATAAAGTTTAATCTGTTATTTTAGCGTTTACCTGGTCTTATCAGTGAGGTTT 

130 140 150 160 170 180 

ATAGCCATTATTAGTGGGATTGAAGTGATTTTTAAAGCTATGTATATTATTGCAAATATA 

190 200 210 220 230 240 

AATTGTAACAATTAAGACTTTGGACACTTGAGTTCAATTTCGAATTGATTGGCATAAAAT 

250 260 270 280 290 300 

TTAAAACAGCTAAATCTACCTCAATCATTTTAGCAAATGTATGCAGGTAGATTTTTTTCG 

310 320 330 340 350 360 

CCATTTAAGAGTACACTTGTACGCTAGGTTTTTGTTTAGTGTGCAAATGAACGTTTTGAT 

370 380 390 400 410 420 

GAGCATTGTTTTTAGAGCACAAAATAGATCCTTACAGGAGCAATAACGCAATGGCTAAAA 

430 440 450 460 470 480 

AGAACACCACATCGATTAAGCACGCCAAGGATGTGTTAAGTAGTGATGATCAACAGTTAA 

490 500 510 520 530 540 

ATTCTCGCTTGCAAGAATGTCCGATTGCCATCATTGGTATGGCATCGGTTTTTGCAGATG 

550 560 570 580 590 600 

CTAAAAACTTGGATCAATTCTGGGATAACATCGTTGACTCTGTGGACGCTATTATTGATG 

€10 620 630 640 650 660 

TGCCTAGCGATCGCTGGAACATTGACGACCATTACTCGGCTGATAAAAAAGCAGCTGACA 

670 680 690 700 710 720 

AGACATACTGCAAACGCGGTGGTTTCATTCCAGAGCTTGATTTTGATCCGATGGAGTTTG 

730 740 750 760 770 780 

GTTTACCGCCAAATATCCTCGAGTTAACTGACATCGCTCAATTGTTGTCATTAATTGTTG 

790 800 810 820 830 840 

CTCGTGATGTATTAAGTGATGCTGGCATTGGTAGTGATTATGACCATGATAAAATTGGTA 

650 860 870 880 890 900 

TCACGCTGGGTGTCGGTGGTGGTCAGAAACAAATTTCGCCATTAACGTCGCGCCTACAAG 

910 920 930 940 950 960 

GCCCGGTATTAGAAAAAGTATTAAAAGCCTCAGGCATTGATGAAGATGATCGCGCTATGA 

970 980 990 1000 1010 1020 

TCATCGACAAATTTAAAAAAGCCTACATCGGCTGGGAAGAGAACTCATTCCCAGGCATGC 

1030 1040 1050 1060 1070 1080 

TAGGTAACGTTATTGCTGGTCGTATCGCCAATCGTTTTGATTTTGGTGGTACTAACTGTG 

1090 1100 1110 1120 1130 1140 

TGGTTGATGCGGCATGCGCTGGCTCCCTTGCASCTGTTAAAATGGCGATCTCAGACTTAC 
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1150 1160 1170 1180 1190 1200 

TTGAATATCGTTCAGAAGTCATGATATCGGGTGGTGTATGTTGTGATAACTCGCCATTCA 

1210 1220 1230 1240 1250 1260 

TGTATATGTCATTCTCGAAAACACCAGCATTTACCACCAATGATGATATCCGTCCGTTTG 

1270 1280 1290 1300 1310 1320 

ATGACGATTCAAAAGGCATGCTGGTTGGTGAAGGTATTGGCATGATGGCGTTTAAACGTC 

1330 1340 1350 1360 1370 1380 

TTGAAGATGCTGAACGTGACGGCGACAAAATTTATTCTGTACTGAAAGGTATCGGTACAT 

1390 1400 1410 1420 1430 1440 

CTTCAGATGGTCG7TTCAAATCTATTTACGCTCCACGCCCAGATGGCCAAGCAAAAGCGC 

1450 1460 1470 1480 1490 1500 

TAAAACGTGCTTATGAAGATGCCGGTTTTGCCCCTGAAACATGTGGTCTAATTGAAGGCC 

1510 1520 1530 1540 1550 1560 

ATGGTACGGGTACCAAAGCGGGTGATGCCGCAGAATTTGCTGGCTTGACCAAACACTTTG 

1570 1580 1590 1600 1610 1620 

GCGCCGCCAGTGATGAAAAGCAATATATCGCCTTAGGCTCAGTTAAATCGCAAATTGGTC 

1630 1640 1650 1660 1670 1680 

ATACTAAATCTGCGGCTGGCTCTGCGGGTATGATTAAGGCGGCATTAGCGCTGCATCATA 

1690 1700 1710 1720 1730 1740 

AAATCTTACCTGCAACGATCCATATCGATAAACCAAGTGAAGCCTTGGATATCAAAAACA 

1750 1760 1770 1780 1790 1800 

GCCCGTTATACCTAAACAGCGAAACGCGTCCTTGGATGCCACGTGAAGATGGTATTCCAC 

1810 1820 1830 1840 1850 I860 

GTCGTGCAGGTATCAGCTCATTTGGTTTTGGCGGCACCAACTTCCATATTATTTTAGAAG 

1870 1880 1890 1900 1910 1920 

AGTATCGCCCAGGTCACGATAGCGCATATCGCTTAAACTCAGTGAGCCAAACTGTGTTGA 

1930 1940 1950 I960 1970 1980 

TCTCGGCAAACGACCAACAAGGTATTGTTGCTGAGTTAAATAACTGGCGTACTAAACTGG 

1990 2000 2010 2020 2030 2040 

CTGTCGATGCTGATCATCAAGGGTTTGTATTTAATGAGTTAGTGACAACGTGGCCATTAA 

2050 2060 2070 2080 2090 2100 

AAACCCCATCCGTTAACCAAGCTCGTTTAGGTTTTGTTGCGCGTAATGCAAATGAAGCGA 

2110 2120 2130 2140 2150 2160 

TCGCGATGATTGATACGGCATTGAAACAATTCAATGCGAACGCAGATAAAATGACATGGT 

2170 2180 2190 2200 2210 2220 

CAGTACCTACCGGGGTTTACTATCGTCAAGCCGGTATTGATGCAACAGGTAAAGTGGTTG 

2230 2240 2250 2260 2270 2280 

CGCTATTCTCAGGGCAAGGTTCGCAATACGTGAACATGGGTCGTGAATTAACCTGTAACT 
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2290 


2300 


2310 


2320 


2330 


2340 


TCCCAAGCATGATGCACAGTGCTGCGGCGATGGATAAAGAGTTCAGTGCCGCTGGTTTAG 

2350 2360 2370* 2380 2390 2400 

GCCAGTTATCTGCAGTTACTTTCCCTATCCCTGTTTATACGGATGCCGAGCGTAAGCTAC 

2410 2420 2430 2440 2450 2460 

AAGAAGAGCAATTACGTTTAACGCAACATGCGCAACCAGCGATTGGTAGTTTGAGTGTTG 

2470 2480 2490 2500 2510 2520 

GTCTGTTCAAAACGTTTAAGCAAGCAGGTTTTAAAGCTGATTTTGCTGCCGGTCATAGTT 

2530 2540 2550 2560 2570 2580 

TCGGTGAGTTAACCGCATTATGGGCTGCCGATGTATTGAGCGAAAGCGATTACATGATGT 

2590 2600 2610 2620 2630 2640 

TAGCGCGTAGTCGTGGTCAAGCAATGGCTGCGCCAGAGCAACAAGATTTTGATGCAGGTA 

2650 2660 2670 2680 2690 2700 

AGATGGCCGCTGTTGTTGGTGATCCAAAGCAAGTCGCTGTGATCATTGATACCCTTGATG 

2710 2720 2730 2740 2750 2760 

ATGTCTCTATTGCTAACTTCAACTCGAATAACCAAGTTGTTATTGCTGGTACTACGGAGC 

2770 2780 2790 2800 2810 2820 

AGGTTGCTGTAGCGGTTACAACCTTAGGTAATGCTGGTTTCAAAGTTGTGCCACTGCCGG 

2830 2840 2850 2860 2870 2880 

TATCTGCTGCGTTCCATACACCTTTAGTTCGTCACGCGCAAAAACCATTTGCTAAAGCGG 

2890 2900 2910 2920 2930 2940 

TTGATAGCGCTAAATTTAAAGCGCCAAGCArrCCAGTGTTTGCTAATGGCACAGGCTTGG 

2950 2960 2970 2980 2990 3000 

TGCATTCAAGCAAACCGAATGACATTAAGAAAAACCTGAAAAACCACATGCTGGAATCTG 

3010 3020 3030 3040 3050 3060 

TTCATTTCAATCAAGAAATTGACAACATCTATGCTGATGGTGGCCGCGTATTTATCGAAT 

3070 3080 3090 3100 3110 3120 

TTGGTCCAAAGAATGTATTAACTAAATTGGTTGAAAACATTCTCACTGAAAAATCTGATG 

3130 3140 3150 3160 3170 3180 

TGACTGCTATCGCGGTTAATGCTAATCCTAAACAACCTGCGGACGTACAAATGCGCCAAG 

3190 3200 3210 3220 3230 3240 

CTGCGCTGCAAATGGCAGTGCTTGGTGTCGCATTAGACAATATTGACCCGTACGACGCCG 

3250 3260 3270 3280 3290 3300 

TTAAGCGTCCACTTGTTGCGCCGAAAGCATCACCAATGTTGATGAAGTTATCTGCAGCGT 

3310 3320 3330 3340 3350 3360 

CTTATGTTAGTCCGAAAACGAAGAAAGCGTTTGCTGATGCATTGACTGATGGCTGGACTG 

3370 3380 3390 3400 3410 3420 

TTAAGCAAGCGAAAGCTGTACCTGCTGTTGTGTCACAACCACAAGTGATTGAAAAGATCG 
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3430 3440 3450 3460 3470 3480 

TTGAAGTTGAAAAGATAGTTGAACGCATTGTCGAAGTAGAGCGTATTGTCGAAGTAGAAA 

3490 3500 3510 * 3520 3530 3540 

AAATCGTCTACGTTAATGCTGACGGTTCGCTTATATCGCAAAATAATCAAGACGTTAACA 

3550 3560 3570 3580 3590 3600 

GCGCTGTTGTTAGCAACGTGACTAATAGCTCAGTGACTCATAGCAGTGATGCTGACCTTG 

3610 3620 3630 3640 3650 3660 

TTGCCTCTATTGAACGCAGTGTTGGTCAATTTGTTGCACACCAACAGCAATTATTAAATG 

3670 3680 3690 3700 3710 3720 

TACAIGAACAGTTTATGCAAGGTCCACAAGACTACGCGAAAACAGTGCAGAACGTACTTG 

3730 3740 3750 3760 3770 3780 

CTGCGCAGACGAGCAATGAATTACCGGAAAGTTTAGACCGTACATTGTCTATGTATAACG 

3790 3800 3810 3820 3830 3840 

AGTTCCAATCAGAAACGCTACGTGTACATGAAACGTACCTGAACAATCAGACGAGCAACA 

3850 3860 3870 3880 3890 3900 

TGAACACCATGCTTACTGGTGCTGAAGCTGATGTGCTAGCAACCCCAATAACTCAGGTAG 

3910 3920 3930 3940 3950 3960 

TGTU^TACAGCCGTTGCCACTAGTCACAAGGTAGTTGCTCCAGTTATTGCTAATACAGTGA 

3970 3980 3990 4000 4010 4020 

CGAATGTTGTATCTAGTGTCAGTAATAACGCGGCGGTTGCAGTGCAAACTGTGGCATTAG 

4030 4040 4050 4060 4070 4080 

CGCCTACGCAAGAAATCGCTCCAACAGTCGCTACTACGCCAGCACCCGCATTGGTTGCTA 

4090 4100 4110 4120 4130 4140 

TCGTGGCTGAACCTGTGATTGTTGCGCATGTTGCTACAGAAGTTGCACCAATTACACCAT 

4150 4160 4170 4180 4190 4200 

CAGTTACACCAGTTGTCGCAACTCAAGCGGCTATCGATGTAGCAACTATTAACAAAGTAA 

4210 4220 4230 4240 4250 4260 

TGTTAGAAGTTGTTGCTGATAAAACCGGTTATCCAACGGATATGCTGGAACTGAGCATGG 

4270 4280 4290 4300 4310 4320 

ACATGGAAGCTGACTTAGGTATCGACTCAATCAAACGTGTTGAGATATTAGGCGCAGTAC 

4330 4340 4350 4360 4370 4380 

AGGAATTGATCCCTGACTTACCTGAACTTAATCCTGAAGATCTTGCTGAGCTACGCACGC 

4390 4400 4410 4420 4430 4440 

TTGGTGAGATTGTCGATTACATGAATTCAAAAGCCCAGGCTGTAGCTCCTACAACAGTAC 

4450 4460 4470 4480 4490 4500 

CTGTAACAAGTGCACCTGTTTCGCCTGCATCTGCTGGTATTGATTTAGCCCACATCCAAA 

4510 4520 4530 4540 4550 4560 

ACGTAATGTTAGAAGTGGTTGCAGACAAAACGGGTTACCCAACAGACATGCTAGAACTGA 
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4570 4580 4590 4600 4610 4620 

GCATGGATATGGAAGCTGACTTAGGTATTGATTCAATCAAGCGTGTGGAAATCTTAGGTG 

4630 4640 4650 ^ 4660 4670 4660 

CAGTACAGGAGATCATAACTGATTTACCTGAGCTAAACCCTGAAGATCTTGCTGAATTAC 

4690 4700 4710 4720 4730 4740 

GCACCCTAGGTGAAATCGTTAGTTACATGCAAAGCAAAGCGCCAGTCGCTGAAAGTGCGC 

4750 4760 4770 4780 4790 4800 

CAGTGGCGACGGCTCCTGTAGCAACAAGCTCAGCACCGTCTATCGATTTGAACCACATTC 

4810 4820 4830 4840 4850 4860 

AAACAGTGATGATGGATGTAGTTGCAGATAAGACTGGTTATCCAACTGACATGCTAGAAC 

4870 4880 4890 4900 4910 4920 

TTGGCATGGACATGGAAGCTGATTTAGGTATCGATTCAATCAAACGTGTGGAAATATTAG 

4930 4940 4950 4960 4970 4980 

GCGCAGTGCAGGAGATCATCACTGATTTACCTGAGCTAAACCCAGAAGACCTCGCTGAAT 

4990 5000 5010 5020 5030 5040 

TACGCACGCTAGGTGAAATCGTTAGTTACATGCAAAGCAAAGCGCCAGTCGCTGAGAGTG 

5050 5060 5070 5080 5090 5100 

CGCCAGTAGCGACGGCTTCTGTAGCAACAAGCTCTGCACCGTCTATCGATTTAAACCATA 

5110 5120 5130 5140 5150 5160 

TCCAAACAGTGATGATGGAAGTGGTTGCAGACAAAACCGGTTATCCAGTAGACATGTTAG 

5170 5180 5190 5200 5210 5220 

AACTTGCTATGGACATGGAAGCTGACCTAGGTATCGATTCAATCAAGCGTGTAGAAATTT 

5230 5240 5250 5260 5270 5280 

TAGGTGCGGTACAGGAAATCATTACTGACTTACCTGAGCTTAACCCTGAAGATCTTGCTG 

5290 5300 5310 5320 5330 5340 

AACTACGTACATTAGGTGAAATCGTTAGTTACATGCAAAGCAAAGCGCCCGTAGCTGAAG 

5350 5360 5370 5380 5390 5400 

CGCCTGCAGTACCTGTTGCAGTAGAAAGTGCACCTACTAGTGTAACAAGCTCAGCACCGT 

5410 5420 5430 5440 5450 5460 

CTATCGATTTAGACCACATCCAAAATGTAATGATGGATGTTGTTGCTGATAAGACTGGTT 

5470 5480 5490 5500 5510 5520 

ATCCTGCCAATATGCTTGAATTAGCAATGGACATGGAAGCCGACCTTGGTATTGATTCAA 

5530 5540 5550 5560 5570 5580 

TCAAGCGTGTTGAAATTCTAGGCGCGGTACAGGAGATCATTACTGATTTACCTGAACTAA 

5590 5600 5610 5620 5630 5640 

ACCCAGAAGACTTAGCTGAACTACGTACGTTAGAAGAAATTGTAACCTACATGCAAAGCA 

5650 5660 5670 5680 5690 S700 

AGGCGAGTGGTGTTACTGTAAATGTAGTGGCl^AGCCCTGAAAATAATGCTGTATCAGATG 


wo 98/55625 


76 / 106 


PCT/US98/11639 


5710 


5720 


5730 


5740 


5750 


5760 


CATTTATGCAAAGCAATGTGGCGACTATCACAGCGGCCGCAGAACATAAGGCGGAATTTA 

5770 5780 5790 5800 5810 5820 

AACCGGCGCCGAGCGCAACCGTTGCTATCTCTCGTCTAAGCTCTATCAGTAAAATAAGCC 

5830 5840 5850 5860 5870 5880 

AAGATTGTAAAGGTGCTAACGCCTTAATCGTAGCTGATGGCACTGATAATGCTGTGTTAC 

5890 5900 5910 5920 5930 5940 

TTGCAGACCACCTATTGCAAACTGGCTGGAATGTAACTGCATTGCAACCAACTTGGGTAG 

5950 5960 5970 5980 5990 6000 

CTGTAACAACGACGAAAGCATTTAATAAGTCAGTGAACCTGGTGACTTTAAATGGCGTTG 

6010 6020 6030 6040 6050 6060 

ATGAAACTGAAATCAACAACATTATTACTGCTAACGCACAATTGGATGCAGTTATCTATC 

6070 6080 6090 6100 6110 6120 

TGCACGCAAGTAGCGAAATTAATGCTATCGAATACCCACAAGCATCTAAGCAAGGCCTGA 

6130 6140 6150 6160 6170 6180 

TGTTAGCCTTCTTATTAGCGAAATTGAGTAAAGTAACTCAAGCCGCTAAAGTGCGTGGCG 

6190 6200 6210 6220 6230 6240 

CCTTTATGATTGTTACTCAGCAGGGTGGTTCATTAGGTTTTGATGATATCGATTCTGCTA 

6250 6260 6270 6280 6290 6300 

CAAGTCATGATGTGAAAACAGACCTAGTACAAAGCGGCTTAAACGGTTTAGTTAAGACAC 

6310, 6320 6330 6340 6350 6360 

TGTCTCACGAGTGGGATAACGTATTCTGTCGTGCGGTTGATATTGCTTCGTCATTAACGG 

6370 6380 6390 6400 6410 6420 

CTGAACAAGTTGCAAGCCTTGTTAGTGATGAACTACTTGATGCTAACACTGTATTAACAG 

6430 6440 6450 6460 6470 6480 

AAGTGGGTTATCAACAAGCTGGTAAAGGCCTTGAACGTATCACGTTAACTGGTGTGGCTA 

6490 6500 6510 6520 6530 6540 

CTGACAGCTATGCATTAACAGCTGGCAATAACATCGATGCTAACTCGGTATTTTTAGTGA 

6550 6560 6570 6580 6590 6600 

GTGGTGGCGCAAAAGGTGTAACTGCACATTGTGTTGCTCGTATAGCTAAAGAATATCAGT 

6610 6620 6630 6640 6650 6660 

CTAAGTTCATCTTATTGGGACGTTCAACGTTCTCAAGTGACGAACCGAGCTGGGCAAGTG 

6670 6680 6690 6700 6710 6720 

GTATTACTGATGAAGCGGCGTTAAAGAAAGCAGCGATGCAGTCTTTGATTACAGCAGGTG 

6730 6740 6750 5760 6770 6780 

ATAAACCAACACCCGTTAAGATCGTACAGCTAATCAAACCAATCCAAGCTAATCGTGAAA 

6790 6800 6810 6820 6830 6840 

TTGCGCAAACCTTGTCTGCAATTACCGCTGCTiSGTGGCCAAGCTGAATATGTTTCTGCAG 
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6850 6860 6870 6880 6890 6900 

ATGTAACTAATGCAGCAAGCGTACAAATGGCAGTCGCTCCAGCTATCGCTAAGTTCGGTG 

6910 6920 6930 ^ 6940 6950 6960 

CAATCACTGGCATCATTCATGGCGCGGGTGTGTTAGCTGACCAATTCATTGAGCAAAAAA 

6970 6980 6990 7000 7010 7020 

CACTGAGTGATTTTGAGTCTGTTTACAGCACTAAAATTGACGGTTTGTTATCGCTACTAT 

7030 7040 7050 7060 7070 7080 

CAGTCACTGAAGCAAGCAACATCAAGCAATTGGTATTGTTCTCGTCAGC6GCTGGTTTCT 

7090 7100 7110 7120 7130 7140 

ACGGTAACCCCGGCCAGTCTGATTACTCGATTGCCAATGAGATCTTAAATAAAACCGCAT 

7150 7160 7170 7180 7190 7200 

ACCGCTTTAAATCATTGCACCCACAAGCTCAAGTATTGAGCTTTAACTGGGGTCCTTGGG 

7210 7220 7230 7240 7250 7260 

ACGGTGGCATGGTAACGCCTGAGCTTAAACGTATGTTTGACCAACGTGGTGTTTACATTA 

7270 7280 7290 7300 7310 7320 

TTCCACTTGATGCAGGTGCACAGTTATTGCTGAATGAACTAGCCGCTAATGATAACCGTT 

7330 7340 7350 7360 7370 7380 

GTCCACAAATCCTCGTGGGTAATGACTTATCTAAAGATGCTAGCTCTGATCAAAAGTCTG 

7390 7400 7410 7420 7430 7440 

ATGAAAAGAGTACTGCTGTAAAAAAGCCACAAGTTAGTCGTTTATCAGATGCTTTAGTAA 

7450 7460 7470 7480 7490 7500 

CTAAAAGTATCAAAGCGACTAACAGTAGCTCTTTATCAAACAAGACTAGTGCTTTATCAG 

7510 7520 7530 7540 7550 7560 

ACAGTAGTGCTTTTCAGGTTAACGAAAACCACTTTTTAGCTGACCACATGATCAAAGGCA 

7570 7580 7590 7600 7610 7620 

ATCAGGTATTACCAACGGTATGCGCGATTGCTTGGATGAGTGATGCAGCAAAAGCGACTT 

7630 7640 7650 7660 7670 7680 

ATAGTAACCGAGACTGTGCATTGAAGTATGTCGGTTTCGAAGACTATAAATTGTTTAAAG 

7690 7700 7710 7720 7730 7740 

GTGTGGTTTTTGATGGCAATGAGGCGGCGGATTACCAAATCCAATTGTCGCCTGTGACAA 

7750 7760 7770 7780 7790 7800 

GGGCGTCAGAACAGGATTCTGAAGTCCGTATTGCCGCAAAGATCTTTAGCCtGAAAAGTG 

7810 7820 7830 7840 7850 7860 

ACGGTAAACCTGTGTTTCATTATGCAGCGACAATATTGTTAGCAACTCAGCCACTTAATG 

7870 7880 7890 7900 7910 7920 

CTGTGAAGGTAGAACTTCCGACATTGACAGAAAGTGTTGATAGCAACAATAAAGTAACTG 

7930 7940 7950 7960 7970 7980 

ATGAAGCACAAGCGTTATACAGCAATGGCACCffTGTTCCACGGTGAAAGTCTGCAGGGCA 
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7990 


8000 


8010 


8020 


8030 


8040 


TTAAGCAGATATTAAGTTGTGACGACAAGGGCCTGCTATTGGCTTGTCAGATAACCGATG 

8050 8060 8070 8080 8090 8100 

TTGCAACAGCTAAGCAGGGATCCTTCCCGTTAGCTGACAACAATATCTTTGCCAATGATT 

8110 8120 8130 8140 8150 B160 

TGGTTTATCAGGCTATGTTGGTCTGGGTGCGCAAACAATTTGGTTTAGGTAGCTTACCTT 

8170 8180 8190 8200 8210 8220 

CGGTGACAACGGCTTGGACTGTGTATCGTGAAGTGGTTGTAGATGAAGTATTTTATCTGC 

8230 8240 8250 8260 6270 8280 

AACTTAATGTTGTTGAGCATGATCTATTGGGTTCACGCGGCAGTAAAGCCCGTTGTGATA 

8290 8300 8310 8320 8330 8340 

TTCAATTGATTGCTGCTGATATGCAATTACTTGCCGAAGTGAAATCAGCGCAAGTCAGTG 

8350 8360 8370 8380 8390 8400 

TCAGTGACATTTTGAACGATATGTCATGATCGAGTAAATAATAACGATAGGCGTCATGGT 

8410 8420 8430 8440 8450 8460 

GAGCATGGCGTCTGCTTTCTTCATTTTTTAACATTAACAATATTAATAGCTAAACGCGGT 

8470 8480 8490 8500 8510 8520 

TGCTTTAAACCAAGTAAACAAGTGCTTTTAGCTATTACTATTCCAAACAGGATATTAAAG 

8530 8540 8550 8560 8570 8580 

AGAATATGACGGAATTAGCTGTTATTGGTATGGATGCTAAATTTAGCGGACAAGACAATA 

6590 8600 6610 8620 8630 8640 

TTGACCGTGTGGAACGCGCTTTCTATGAAGGTGCTTATGTAGGTAATGTTAGCCGCGTTA 

8650 8660 8670 8680 8690 8700 

GTACCGAATCTAATGTTATTAGCAATGGCGAAGAACAAGTTATTACTGCCATGACAGTTC 

8710 8720 8730 8740 8750 8760 

TTAACTCTGTCAGTCTACTAGCGCAAACGAATCAGTTAAATATAGCTGATATCGCGGTGT 

8770 8780 8790 8800 8810 8820 

TGCTGATTGCTGATGTAAAAAGTGCTGATGATCAGCTTGTAGTCCAAATTGCATCAGCAA 

8830 8840 8850 8860 . 6870 8880 

TTGAAAAACAGTGTGCGAGTTGTGTTGTTATTGCTGATTTAGGCCAAGCATTAAATCAAG 

8890 8900 8910, 8920 8930 8940 

TAGCTGATTTAGTTAATAACCAAGACTGTCCTGTGGCTGTAATTGGCATGAATAACTCGG 

8950 8960 8970 8980 8990 9000 

TTAATTTATCTCGTCATGATCTTGAATCTGTAACTGCAACAATCAGCTTTGATGAAACCT 

9010 9020 9030 9040 9050 9060 

TCAATGGTTATAACAATGTAGCTGGGTTCGCGAGTTTACTTATCGCTTCAACTGCGTTTG 

9070 9080 9090 9100 9110 9120 

CCAATGCTAAGCAATGTTATATATACGCCAACATTAAGGGCTTCGCTCAATCGGGCGTAA 
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9130 


9X40 


9150 


9160 


9170 


9180 


ATGCTCAATTTAACGTTGGAAACATTAGCGATACTGCAAAGACCGCATTGCAGCAAGCTA 

9190 9200 9210 9220 9230 9240 

GCATAACTGCAGAGCAGGTTGGTTTGTTAGAAGTGTCAGCAGTCGCTGATTCGGCAATCG 

9250 9260 9270 9280 9290 9300 

CATTGTCTGAAAGCCAAGGTTTAATGTCTGCTTATCATCATACGCAAACTTTGCATACTG 

9310 9320 9330 9340 9350 9360 

CATTAAGCAGTGCCCGTAGTGTGACTGGTGAAGGCGGGTGTTTTTCACAGGTCGCAGGTT 

9370 9380 9390 9400 9410 9420 

TATTGAAATGTGTAATTGGTTTACATCAACGTTATATTCCGGCGATTAAAGATTGGCAAC 

9430 9440 9450 9460 9470 9480 

AACCGAGTGACAATCAAATGTCACGGTGGCGGAATTCACCATTCTATATGCCTGTAGATG 

9490 9500 9510 9520 9530 9540 

CTCGACCTTGGTTCCCACATGCTGATGGCTCTGCACACATTGCCGCTTATAGTTGTGTGA 

9550 9560 9570 9580 9590 9600 

CTGCTGACAGCTATTGTCATATTCTTTTACAAGAAAACGTCTTACAAGAACTTGTTTTGA 

9610 9620 9630 9640 9650 9660 

AAGAAACAGTCTTGCAAGATAATGACTTAACTGAAAGCAAGCTTCAGACTCTTGAACAAA 

9670 9680 9690 9700 9710 9720 

ACAATCCAGTAGCTGATCTGCGCACTAATGGTTACTTTGCATCGAGCGAGTTAGCATTAA 

9730 9740 9750 9760 9770 9780 

TCATAGTACAAGGTAATGACGAAGCACAATTACGCTGTGAATTAGAAACTATTACAGGGC 

9790 9800 9810 9820 9830 9840 

AGTTAAGTACTACTGGCATAAGTACTATCAGTATTAAACAGATCGCAGCAGACTGTTATG 

9850 9860 9870 9880 9890 9900 

CCCGTAATGATACTAACAAAGCCTATAGCGCAGTGCTTATTGCCGAGACTGCTGAAGAGT 

9910 9920 9930 9940 9950 9960 

TAAGCAAAGAAATAACCTTGGCGTTTGCTGGTATCGCTAGCGTGTTTAATGAAGATGCTA 

9970 9980 9990 10000 10010 10020 

AAGAATGGAAAACCCCGAAGGGCAGTTATTTTACCGCGCAGCCTGCAAATAAACAGGCTG 

10030 10040 10050 10060 10070 10080 

CTAACAGCACACAGAATGGTGTCACCTTCATGTACCCAGGTATTGGTGCTACATATGTTG 

10090 10100 10110 10120 10130 10140 

GTTTAGGGCGTGATCTATTTCATCTATTCCCACAGATTTATCAGCCTGTAGCGGCTTTAG 

10150 10160 10170 10180 10190 10200 
CCGATGACATTGGCGAAAGTCTAAAAGATACTTTACTTAATCCACGCAGTATTAGTCGTC 

10210 10220 . 10230 10240 10250 10260 
ATAGCTTTAAAGAACTCAAGCAGTTGGATCTQGACCTGCGCGGTAACTTAGCCAATATCG 
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10270 10280 10290 10300 10310 10320 

CTGAAGCCGGTGTGGGTTTTGCTTGTGTGTTTACCAAGGTATTTGAAGAAGTCTTTGCCG 

10330 10340 10350 10360 10370 10380 

TTAAAGCTGACTTTGCTACAGGTTATAGCATGGGTGAAGTAAGCATGTATGCAGCACTAG 

10390 10400 10410 10420 10430 10440 

GCTGCTGGCAGCAACCGGGATTGATGAGTGCTCGCCTTGCACAATCGAATACCTTTAATC 

10450 10460 10470 10480 10490 10500 

ATCAACTTTGCGGCGAGTTAAGAACACTACGTCAGCATTGGGGCATGGATGATGTAGCTA 

10510 10520 10530 10540 10550 10560 

ACGGTACGTTCGAGCAGATCTGGGAAACCTATACCATTAAGGCAACGATTGAACAGGTCG 

10570 10580 10590 10600 10610 10620 

AAATTGCCTCTGCAGATGAAGATCGTGTGTATTGCACCATTATCAATACACCTGATAGCT 

10630 10640 10650 10660 10670 10680 

TGTTGTTAGCCGGTTATCCAGAAGCCTGTCAGCGAGTCATTAAGAATTTAGGTGTGCGTG 

10690 10700 10710 10720 10730 10740 

CAATGGCATTGAATATGGCGAACGCAATTCACAGCGCGCCAGCTTATGCCGAATACGATC 

10750 10760 10770 10780 10790 10800 

ATATGGTTGAGCTATACCATATGGATGTTACTCCACGTATTAATACCAAGATGTATTCAA 

10810 10820 10830 10840 10850 10860 

GCTCATGTTATTTACCGATTCCACAACGCAGCAAAGCGATTTCCCACAGTATTGCTAAAT 

10870 10880 10890 10900 10910 10920 

GTTTGTGTGATGTGGTGGATTTCCCACGTTTGGTTAATACCTTACATGACAAAGGTGCGC 

10930 10940 10950 10960 10970 10980 

GGGTATTCATTGAAATGGGTCCAGGTCGTTCGTTATGTAGCTGGGTAGATAAGATCTTAG 

10990 11000 11010 11020 11030 11040 

TTAATGGCGATGGCGATAATAAAAAGCAAAGCCAACATGTATCTGTTCCTGTGAATGCCA 

11050 11060 11070 11080 11090 11100 

AAGGCACCAGTGATGAACTTACTTATATTCGTGCGATTGCTAAGTTAATTAGTCATGGCG 

11110 11120 11130 11140 11150 11160 

TGAATTTGAATTTAGATAGCTTGTTTAACGGGTCAATCCTGGTTAAAGCAGGCCATATAG 

11170 11180 11190 11200 11210 11220 

CAAACACGAACAAATAGTCAACATCGATATCTAGCGCTGGTGAGTTATACCTCATTAGTT 

11230 11240 11250 11260 11270 11280 

GAAATATGGATTTAAAGAGAGTAATTATGGAAAATATTGCAGTAGTAGGTATTGCTAATT 

11290 11300 11310 11320 11330 11340 
TGTTCCCGGGCTCACAAGCACCGGATCAATTTTGGCAGCAATTGCTTGAACAACAAGATT 

11350 11360 11370 11380 11390 11400 

GCCGCAGTAAGGCGACCGCTGTTCAAATGGGGGTTGATCCTGCTAAATATACCGCCAACA 
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11410 11420 11430 11440 11450 11460 

AAGGTGACACAGATAAATTTTACTGTGTGCACGGCGGTTACATCAGTGATTTCAATTTTG 

11470 11480 11490 11500 11510 11520 

ATGCTTCAGGTTATCAACTCGATAATGATTATTTAGCCGGTTTAGATGACCTTAATCAAT 

11530 11540 11550 11560 11570 11580 

GGGGGCTTTATGTTACGAAACAAGCCCTTACCGATGCGGGTTATTGGGGCAGTACTGCAC 

11590 11600 11610 11620 11630 11640 

TAGAAAACTGTGGTGTGATTTTAGGTAATTTGTCATTCCCAACTAAATCATCTAATCAGC 

11650 11660 11670 11680 11690 11700 
TGTTTATGCCTTTGTATCATCAAGTTGTTGATAATGCCTTAAAGGCGGTATTACATCCTG 

11710 11720 11730 11740 11750 11760 

ATTTTCAATTAACGCATTACACAGCACCGAAAAAAACACATGCTGACAATGCATTAGTAG 

11770 11780 11790 11800 11810 11820 

CAGGTTATCCAGCTGCATTGATCGCGCAAGCGGCGGGTCTTGGTGGTTCACATTTTGCAC 

11830 11840 11850 11860 11870 11880 

TGGATGCGGCTTGTGCTTCATCTTGTTATAGCGTTAAGTTAGCGTGTGATTACCTGCATA 

11890 11900 11910 11920 11930 11940 

CGGGTAAAGCCAACATGATGCTTGCTGGTGCGGTATCTGCAGCAGATCCTATGTTCGTAA 

11950 11960 11970 11980 11990 12000 

ATATGGGTTTCTCGATATTCCAAGCTTACCCAGCTAACAATGTACATGCCCCGTTTGACC 

12010 12020 12030 12040 12050 12060 

AAAATTCACAAGGTCTATTTGCCGGTGAAGGCGCGGGCATGATGGTATTGAAACGTCAAA 

12070 12080 12090 12100 12110 12120 

GTGATGCAGTACGTGATGGTGATCATATTTACGCCATTATTAAAGGCGGCGCATTATCGA 

12130 12140 12150 12160 12170 12180 

ATGACGGTAAAGGCGAGTTTGTATTAAGCCCGAACACCAAGGGCCAAGTATTAGTATATG 

12190 12200 12210 12220 12230 12240 

AACGTGCTTATGCCGATGCAGATGTTGACCCGAGTACAGTTGACTATATTGAATGTCATG 

12250 12260 12270 12280 12290 12300 

CAACGGGCACACCTAAGGGTGACAATGTTGAATTGCGTTCGATGGAAACCTTTTTCAGTC 

12310 12320 12330 12340 12350 12360 

GCGTAAATAACAAACCATTACTGGGCTCGGTTAAATCTAACCTTGGTCATTTGTTAACTG 

12370 12380 12390 12400 12410 12420 

CCGCTGGTATGCCTGGCATGACCAAAGCTATGTTAGCGCTAGGTAAAGGTCTTATTCCTG 

12430 12440 12450 12460 12470 12480 

CAACGATTAACTTAAAGCAACCACTGCAATCTAAAAACGGTTACTTTACTGGCGAGCAAA 

12490 12500 12510 12520 12530 12540 

TGCCAACGACGACTGTGTCTTGGCCAACAAC1CCGGGTGCCAAGGCAGATAAACCGCGTA 
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12550 12560 12570 12580 12590 12600 

CCGCAGGTGTGAGCGTATTTGGTTTTGGTGGCAGCAACGCCCATTTGGTATTACAACAGC 

12610 12620 12630 12640 12650 12660 

CAACGCAAACACTCGAGACTAATTTTAGTGTTGCTAAACCACGTGAGCCTTTGGCTATTA 

12670 12680 12690 12700 12710 12720 

TTGGTATGGACAGCCATTTTGGTAGTGCCAGTAATTTAGCGCAGTTCAAAACCTTATTAA 

12730 12740 12750 12760 12770 12780 

ATAATAATCAAAATACCTTCCGTGAATTACCAGAACAACGCTGGAAAGGCATGGAAAGTA 

12790 12800 12810 12820 12830 12840 

ACGCTAACGTCATGCAGTCGTTACAATTACGCAAAGCGCCTAAAGGCAGTTACGTTGAAC 

12850 12860 12870 12880 12890 12900 

AGCTAGATATTGATTTCTTGCGTTTTAAAGTACCGCCTAATGAAAAAGATTGCTTGATCC 

12910 12920 12930 12940 12950 12960 

CGCAACAGTTAATGATGATGCAAGTGGCAGACAATGCTGCGAAAGACGGAGGTCTAGTTG 

12970 12980 12990 13000 13010 13020 

AAGGTCGTAATGTTGCGGTATTAGTAGCGATGGGCATGGAACTGGAATTACATCAGTATC 

13030 13040 13050 13060 13070 13080 

GTGGTCGCGTTAATCTAACCACCCAAATTGAAGACAGCTTATTACAGCAAGGTATTAACC 

13090 13100 13110 13120 13130 13140 

TGACTGTTGAGCAACGTGAAGAACTGACCAATATTGCTAAAGACGGTGTTGCCTCGGCTG 

13150 13160 13170 13180 13190 13200 

CACAGCTAAATCAGTATACGAGTTTCATTGGTAATATTATGGCGTCACGTATTTCGGCGT 

13210 13220 13230 13240 13250 13260 

TATGGGATTTTTCTGGTCCTGCTATTACCGTATCGGCTGAAGAAAACTCTGTTTATCGTT 

13270 13280 13290 13300 13310 13320 

GTGTTGAATTAGCTGAAAATCTATTTCAAACCAGTGATGTTGAAGCCGTTATTATTGCTG 

13330 13340 13350 13360 13370 13380 
CTGTTGATTTGTCTGGTTCAATTGAAAACATTACTTTACGTCAGCACTACGGTCCAGTTA 

13390 13400 13410 13420 13430 13440 

ATGAAAAGGGATCTGTAAGTGAATGTGGTCCGGTTAATGAAAGCAGTTCAGTAACCAACA 

13450 13460 13470 13480 13490 13500 

ATATTCTTGATCAGCAACAATGGCTGGTGGGTGAAGGCGCAGCGGCTATTGTCGTTAAAC 

13510 13520 13530 13540 13550 13560 

CGTCATCGCAAGTCACTGCTGAGCAAGTTTATGCGCGTATTGATGCGGTGAGTTTTGCCC 

13570 13580 13590 13600 13610 13620 

CTGGTAGCAATGCGAAAGCAATTACGATTGCAGCGGATAAAGCATTAACACTTGCTGGTA 

13630 13640 13650 13660 13670 13680 
TCAGTGCTGCTGATGTAGCTAGTGTTGAAGCACATGCAAGTGGTTTTAGTGCCGAAAATA 
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13690 13700 13710 13720 13730 13740 

ATGCTGAAAAAACCGCGTTACCGACTTTATACCCAAGCGCAAGTATCAGTTCGGTGAAAG 

13750 13760 13770 13780 13790 13800 

CCAATATTGGTCATACGTTTAATGCCTCGGGTATGGCGAGTATTATTAAAACGGCGCTGC 

13810 13820 13830 13840 13850 13860 
TGT7AGATCAGAATACGAGTCAAGATCAGAAAAGCAAACATATTGCTATTAACGGTCTAG 

13870 13880 13890 13900 13910 13920 

GTCGTGATAACAGCTGCGCGCATCTTATCTTATCGAGTTCAGCGCAAGCGCATCAAGTTG 

13930 13940 13950 13960 13970 13980 
CACCAGCGCCTGTATCTGGTATGGCCAAGCAACGCCCACAGTTAGTTAAAACCATCAAAC 

13990 14000 14010 14020 14030 14040 

TCGGTGGTCAGTTAATTAGCAACGCGATTGTTAACAGTGCGAGTTCATCTTTACACGCTA 

14050 14060 14070 14080 14090 14100 

TTAAAGCGCAGTTTGCCGGTAAGCACTTAAACAAAGTTAACCAGCCAGTGATGATGGATA 

14110 14120 14130 14140 14150 14160 

ACCTGAAGCCCCAAGGTATTAGCGCTCATGCAACCAATGAGTATGTGGTGACTGGAGCTG 

14170 14180 14190 14200 14210 14220 

CTAACACTCAAGCTTCTAACATTCAAGCATCTCATGTTCAAGCGTCAAGTCATGCACAAG 

14230 14240 14250 14260 14270 14280 

AGATAGCACCAAACCAAGTTCAAAATATGCAAGCTACAGCAGCCGCTGTAAGTTCACCCC 

14290 14300 14310 14320 14330 143.40 

TTTCTCAACATCAACACACAGCGCAGCCCGTAGCGGCACCGAGCGTTGTTGGAGTGACTG 

14350 14360 14370 14380 14390 14400 

TGAAACATAAAGCAAGTAACCAAATTCATCAGCAAGCGTCTACGCATAAAGCATTTTTAG 

14410 14420 14430 14440 14450 14460 

AAAGTCGTTTAGCTGCACAGAAAAACCTATCGCAACTTGTTGAATTGCAAACCAAGCTGT 

14470 14480 14490 14S0O 14510 14520 

CAATCCAAACTGGTAGTGACAATACATCTAACAATACTGCGTCAACAAGCAATACAGTGC 

14530 14540 14550 14560 14570 14580 

TAACAAATCCTGTATCAGCAACGCCATTAACACTTGTGTCTAATGCGCCTGTAGTAGCGA 

14590 14600 14610 14620 14630 14640 

CAAACCTAACCAGTACAGAAGCAAAAGCGCAAGCAGCTGCTACACAAGCTGGTTTTCAGA 

14650 14660 14670 14680 14690 14700 

TAAAAGGACCTGTTGGTTACAACTATCCACCGCTGCAGTTAATTGAACGTTATAATAAAC 

14710 14720 14730 14740 14750 14760 

CAGAAAACGTGATTTACGATCAAGCTGATTTGGTTGAATTCGCTGAAGGTGATATTGGTA 

14770 14780 14790 14800 14810 14820 

AGGTATTTGGTGCTGAATACAATATTATTGATCGCTATTCGCGTCGTGTACGTCTGCCAA 
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14830 


14840 


14650 


14860 


14870 


14880 


CCTCAGATTACTTGTTAGTAACACGTGTTACTGAACTTGATGCCAAGGTGCATGAATACA 

14890 14900 14910 14920 14930 14940 

AGAAATCATACATGTGTACTGAATATGATGTGCCTGTTGATGCACCGTTCTTAATTGATG 

14950 14960 14970 14980 14990 15000 

GTCAGATCCCTTGGTCTGTTGCCGXCGAATCAGGCCAGTGTGATTTGATGTTGATTTCAT 

15010 15020 15030 15040 15050 15060 
ATATCGGTATTGATTTCCAAGCGAAAGGCGAACGTGTTTACCGTTTACTTGATTGTGAAT 

15070 15080 15090 1510O 15110 15120 

TAACTTTCCTTGAAGAGATGGCTTTTGGTG6CGATACTTTACGTTACGAGATCCACATTG 

15130 15140 15150 15160 15170 15180 

ATTCGTATGCACGTAACGGCGAGCAATTATTATTCTTCTTCCATTACGATTGTTACGTAG 

15190 15200 15210 15220 15230 15240 

GGGATAAGAAGGTACTTATCATGCGTAATGGTTGTGCTGGTTTCTTTACTGACGAAGAAC 

15250 15260 15270 15280 15290 15300 

TTTCTGATGGTAAAGGCGTTATTCATAACGACAAAGACAAAGCTGAGTTTAGCAATGCTG 

15310 15320 15330 15340 15350 15360 

TTAAATCATCATTCACGCCGTTATTACAACATAACCGTGGTCAATACGATTATAACGACA 

15370 15380 15390 15400 15410 15420 

TGATGAAGTTGGTTAATGGTGATGTTGCCAGTTGTTTTGGTCCGCAATATGATCAAGGTG 

15430 15440 15450 15460 15470 15480 

GCCGTAATCCATCATTGAAATTCTCGTCTGAGAAGTTCTTGATGATTGAACGTATTACCA 

15490 15500 15510 15520 15530 15540 

AGATAGACCCAACCGGTGGTCATTGGGGACTAGGCCTGTTAGAAGGTCAGAAAGATTTAG 

15550 15560 15570 15580 15590 15600 

ACCCTGAGCATTGGTATTTCCCTTGTCACTTTAAAGGTGATCAAGTAATGGCTGGTTCGT 

15610 15620 15630 15640 15650 15660 

TGATGTCGGAAGGTTGTGGCCAAATGGCGATGTTCTTCATGCTGTCTCTTGGTATGCATA 

15670 15680 15690 15700 15710 15720 

CCAATGTGAACAACGCTCGTTTCCAACCACTACCAGGTGAATCACAAACGGTACGTTGTC 

15730 15740 15750 15760 15770 15780 

GTGGGCAAGTACTGCCACAGCGCAATACCTTAACTTACCGTATGGAAGTTACTGCGATGG 

15790 15800 15810 15820 15830 15840 

GTATGCATCCACAGCCATTCATGAAAGCTAATATTGATATTTTGCTTGACGGTAAAGTGG 

15850 15860 15870 15880 15890 15900 

TTGTTGATTTCAAAAACTTGAGCGTGATGATCAGCGAACAAGATGAGCATTCAGATTACC 


15910 15920 15930 15940 15950 15960 

CTGTAACACTGCCGAGTAATGTGGCGCTTAAAPCGATTACTGCACCTGTTGCGTCAGTAG 
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15970 


15980 


15990 


16000 


16010 


16020 


CACCAGCATCTTCACCCGCTAACAGCGCGGATCTAGACGAACGTGGTGTTGAACCGTTTA 

16030 16040 16050 16060 16070 16080 

AGTTTCCTGAACGTCCGTTAATGCGTGTTGAGTCAGACTTGTCTGCACCGAAAAGCAAAG 

16090 16100 16110 16120 16130 16140 

GTGTGACACCGATTAAGCATTTTGAAGCGCCTGCTGTTGCTGGTCATCATAGAGTGCCTA 

16150 16160 16170 16180 16190 16200 

ACCAAGCACCGTTTACACCTTGGCATATGTTTGAGTTTGCGACGGGTAATATTTCTAACT 

16210 16220 16230 16240 16250 16260 
GTTTCGGTCCTGATTTTGATGTTTATGAAGGTCGTATTCCACCTCGTACACCTTGTGGCG 

16270 16280 16290 16300 16310 16320 

ATTTACAAGTTGTTACTCAGGTTGTAGAAGTGCA6GGCGAACGTCTTGATCTTAAAAATC 

16330 16340 16350 16360 16370 16380 

CATCAAGCTGTGTAGCTGAATACTATGTACCGGAAGACGCTTGGTACTTTACTAAAAACA 

16390 16400 16410 16420 16430 16440 

GCCATGAAAACTGGATGCCTTATTCATTAATCATGGAAATTGCATTGCAACCAAATGGCT 

16450 16460 16470 16480 16490 16500 

rTATTTCTGGTTACATGGGCACGACGCTTAAATACCCTGAAAAAGATCTGTTCTTCCGTA 

16510 16520 16530 16540 16550 16560 

ACCTTGATGGTAGCGGCACGTTATTAAAGCAGATTGATTTACGCGGCAAGACCATTGTGA 

16570 16580 16590 16600 16610 16620 

ATAAATCAGTCTTGGTTAGTACGGCTATTGCTG6TGGCGCGATTATTCAAAGTTTCACGT 

16630 16640 16650 16660 16670 166B0 

TTGATATGTCTGTAGATGGCGAGCTATTTTATACTGGTAAAGCTGTATTTGGTTACTTTA 

16690 16700 16710 16720 16730 16740 

GTGGTGAATCACTGACTAACCAACTGGGCATTGATAACGGTAAAACGACTAATGCGTGGT 

16750 16760 16770 16780 16790 16800 

TTGTTGATAACAATACCCCCGCAGCGAATATTGATGTGTTTGATTTAACTAATCAGTCAT 

16810 16820 16830 16840 16850 16860 

TGGCTCTGTATAAAGCGCCTGTGGATAAACCGCATTATAAATTGGCTGGTGGTCAGATGA 

16870 16880 16890 16900 16910 16920 

ACTTTATCGATACAGTGTCAGTGGTTGAAGGCGGTGGTAAAGCGGGCGTGGCTTATGTTT 

16930 16940 16950 16960 16970 16980 

ATGGCGAACGTACGATTGATGCTGATGATTGGTTCTTCCGTTATCACTTCCACCAAGATC 

16990 17000 17010 17020 17030 17040 

CGGTGATGCCAGGTTCATTAGGTGTTGAAGCTATTATTGAGTTGATGCAGACCTATGCGC 

17050 17060 17O70 17080 17090 17100 

TTAAAAATGATTTGGGTGGCAAGTTTGCTAACCCACGTTTCATTGCGCCGATGACGCAAG 



wo 98/55625 


86 / 106 


PCT/US98/n639 


17110 


17120 


17130 


17140 


17150 


17160 


TTGATTGGAAATACCGTGGGCAAATTACCCCGCTGAATAAACAGATGTCACTGGACGTGC 

17170 17180 17190 17200 17210 17220 

ATATCACTGAGATCGTGAATGACGCTGGTGAAGTGCGAATCGTTGGTGATGCGAATCTGT 

17230 17240 17250 17260 17270 17280 

CTAAAGATGGTCTGCGTATTTATGAAGTTAAAAACATCGTTTTAAGTATTGTTGAAGCGT 

17290 17300 17310 17320 17330 17340 

AAAGGGTCAAGTGTAACGTGCTTAAGCGCCGCATTGGTTAAAGACGCTTTGCACGCCGTG 

17350 17360 17370 173B0 17390 17400 

AATCCGTCCATGGAGGCTTGGGGTTGGCATCCATGCCAACAACAGCAAGCTTACTTTAAT 

17410 17420 17430 17440 17450 17460 

CAATACGGCTTGGTGTCCATTTAGACGCCTCGAACTTAGTAGTTAATAGACAAAATAATT 

17470 17480 17490 17500 17510 17520 

TAGCTGTGGAATGAATATAGTAAGTAATCATTCGGCAGCTACAAAAAAGGAATTAAGAAT 

17530 17540 17550 17560 17570 17580 

GTCGAGTTTAGGTTTTAACAATAACAACGCAATTAACTGGGCTTGGAAAGTAGATCCAGC 

17590 17600 17610 17620 17630 17640 

GTCAGTTCATACACAAGATGCAGAAATTAAAGCAGCTTTAATGGATCTAACTAAACCTCT 

17650 17660 17670 17680 17690 17700 

CTATGTGGCGAATAATTCAGGCGTAACTGGTATAGCTAATCATACGTCAGTAGCAGGTGC 

17710 17720 17730 17740 17750 17760 

GATCAGCAATAACATCGATGTTGATGTATTGGCGTTTGCGCAAAAGTTAAACCCAGAAGA 

17770 17780 17790 17800 17810 17820 

TCTGGGTGATGATGCTTACAAGAAACAGCACGGCGTTAAATATGCTTATCATGGCGGTGC 

17830 17840 17850 17860 17870 17880 

GATGGCAAATGGTATTGCCTCGGTTGAATTGGTTGTTGCGTTAGGTAAAGCAGGGCTGTT 

17890 17900 17910 17920 17930 17940 

ATGTTCATTTGGTGCTGCAGGTCTAGTGCCTGATGCGGTTGAAGATGCAATTCGTCGTAT 

17950 17960 17970 17980 17990 18000 

TCAAGCTGAATTACCAAATGGCCCTTATGCGGTTAACTTGATCCATGCACCAGCAGAAGA 

18010 18020 18030 18040 18050 18060 

AGCATTAGAGCGTGGCGCGGTTGAACGTTTCCTAAAACTTGGCGTCAAGACGGTAGAGGC 

18070 18080 18090 18100 18110 18120 

TTCAGCTTACCTTGGTTTAACTGAACACATTGTTTGGTATCGTGCTGCTGGTCTAACTAA 

18130 18140 18150 18160 18170 18180 

AAACGCAGATGGCAGTGTTAATATCGGTAACAAGGTTATCGCTAAAGTATCGCGTACCGA 

18190 18200 18210 18220 18230 18240 
AGTTGGTCGCCGCTTTATGGAACCTGCACCGQAAAAATTACTGGATAAGTTATTAGAACA 
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18250 


18260 


18270 


16280 


18290 


18300 


AAATAAGATCACCCCTGAACAAGCTGCTTTAGCGTTGCTTGTACCTATGGCTGATGATAT 

18310 18320 18330 18340 18350 18360 
TACTGGGGAAGCGGATTCTGGTGGTCATACAGATAACCGTCCGTTTTTAACATTATTACC 

18370 18380 18390 18400 18410 18420 

GACGATTATTGGTCTGCGTGATGAAGTGCAAGCGAAGTATAACTTCTCTCCTGCATTACG 

18430 18440 18450 18460 18470 18480 

TGTTGGTGCTGGTGGTGGTATCGGAACGCCTGAAGCAGCACTCGCTGCATTTAACATGGG 

18490 18500 18510 18520 18530 18540 

CGCGGCTTATATCGTTCTGGGTTCTGTGAATCAGGCGTGTGTTGAAGCGGGTGCATCTGA 

18550 18560 18570 18580 18590 18600 

ATATACTCGTAAACTGTTATCGACAGTTGAAATGGCTGATGTGACTATGGCACCTGCTGC 

18610 18620 18630 18640 18650 18660 

AGATATGTTTGAAATGGGTGTGAAGCTGCAAGTATTAAAACGCGGTTCTATGTTCGCGAT 

18670 18680 18690 18700 18710 18720 

GCGTGCGAAGAAACTGTATGACTTGTATGTGGCTTATGACTCGATTGAAGATATCCCAGC 

18730 18740 18750 18760 18770 18780 

TGCTGAACGTGAGAAGATTGAAAAACAAATCTTCCGTGCAAACCTAGACGAGATTTGGGA 

18790 18800 18810 18820 18830 18840 

TGGCACTATCGCTTTCTTTACTGAACGCGATCCAGAAATGCTAGCCCGTGCAACGAGTAG 

18850 18860 16870 18880 18890 16900 

TCCTAAACGTAAAATGGCACTTATCTTCCGTTGGTATCTTGGCCTTTCTTCACGCTGGTC 

18910 18920 18930 18940 18950 18960 

AAACACAGGCGAGAAGGGACGTGAAATGGATTATCAGATTTGGGCAGGCCCAAGTTTAGG 

18970 18980 18990 19000 19010 19020 

TGCATTCAACAGCTGGGTGAAAGGTTCTTACCTTGAAGACTATACCCGCCGTGGCGCTGT 

19030 19040 19050 19060 19070 ' 19080 

AGATGTTGCTTTGCATATGCTTAAAGGTGCTGCGTATTTACAACGTGTAAACCAGTTGAA 

19090 19100 19X10 19120 19130 19140 

ATTGCAAGGTGTTAGCTTAAGTACAGAATTGGCAAGTTATCGTACGAGTGATTAATGTTA 

19150 19160 19170 19180 19190 19200 

CTTGATGATATGTGAATTAATTAAAGCGCCTGAGGGCGCTTTTTTTGGTTTTTAACTCAG 

19210 19220 
GTGTTGTAACTCGAAATTGCCCCTTTC 
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fGIUU:fQTGtC6ReATQCAGCAfOTGaU3GCCCICTTQCTG» 

190 200 210 290 330 240 
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370 380 390 400 410 420 

TTAMC0»nTQMGJ^C»GAGC6TmTGQC0AC^ 
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AGQCTMl36CACT%AAAOGTGC7T»a3ACG]lYaCA^ 

S50 980 370 500 590 600 

T3C7T6MGCCCftOGG(»CAG6CIUCafiaU^^ 

810 820 830 640 650 660 

670 680 690 700 710 720 

CAC&GATTC(3VCACACXlUVASCaACAGOOG6^ 

730 740 750 760 770 730 

790 800 810 820 830 840 

AXAfSGAAOJUACGCCTVTCTAOCTCJUiTACACASA^ 

850 860 870 880 

ASGGTACACCGCCSSCGTQCXGGZmAGCTCATTO^^ 
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